= sycl_ext_oneapi_atomic16

:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en
:dpcpp: pass:[DPC++]

// Set the default source code type in this document to C++,
// for syntax highlighting purposes.  This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}


== Notice

[%hardbreaks]
Copyright (C) 2022 Intel Corporation.  All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc.  OpenCL(TM) is a trademark of Apple Inc. used by
permission by Khronos.


== Contact

To report problems with this extension, please open a new issue at:

https://github.com/intel/llvm/issues


== Dependencies

This extension is written against the SYCL 2020 revision 6 specification.  All
references below to the "core SYCL specification" or to section numbers in the
SYCL specification refer to that revision.

The `sycl::ext::oneapi::bfloat16` class is provided by the
link:../supported/sycl_ext_oneapi_bfloat16.asciidoc[sycl_ext_oneapi_bfloat16]
extension.


== Status

This is a proposed extension specification, intended to gather community
feedback.  Interfaces defined in this specification may not be implemented yet
or may be in a preliminary state.  The specification itself may also change in
incompatible ways before it is finalized.  *Shipping software products should
not rely on APIs defined in this specification.*


== Overview

The SYCL 2020 specification only provides support for 32-bit and 64-bit
atomics. 32-bit atomics are guaranteed to be supported for all devices, while
64-bit atomics are an optional feature.

This extension introduces 16-bit atomics as a new optional feature. It adds a
new aspect to identify devices with support for 16-bit atomics, and adds
new specializations of `sycl::atomic_ref` for 16-bit types including `short`,
`unsigned short`, `sycl::half` and `sycl::ext::oneapi::bfloat16`.

== Specification

=== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification.  An implementation supporting this extension must predefine the
macro `SYCL_EXT_ONEAPI_ATOMIC16` to one of the values defined in the
table below.  Applications can test for the existence of this macro to
determine if the implementation supports this feature, or applications can test
the macro's value to determine which of the extension's features the
implementation supports.

[%header,cols="1,5"]
|===
|Value
|Description

|1
|The APIs of this experimental extension are not versioned, so the
 feature-test macro always has this value.
|===

=== Optional support for 16-bit atomics

Support for 16-bit atomics is an _optional feature_, as described in
link:https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:optional-kernel-features[Section 5.7]
of the SYCL specification.

This extension introduces a new aspect, `aspects::ext_oneapi_atomic16`, to
identify devices with support for 16-bit atomics.

[%header,cols="1,5"]
|===
|Aspect
|Description

|`aspect::ext_oneapi_atomic16`
|Indicates that kernels submitted to the device may perform 16-bit atomic
 operations.
|===


=== Specializations of `sycl::atomic_ref`

This extension permits the following types to be used to create atomic
references:

- `short` when `sizeof(short) == 2`
- `unsigned short` when `sizeof(unsigned short) == 2`
- `sycl::half`
- `sycl::ext::oneapi::bfloat16`

[NOTE]
====
The `sycl::ext::oneapi::bfloat16` class is only available for implementations
supporting the `sycl_ext_oneapi_bfloat16` extension.
====

For `sycl::half` and `sycl::ext::oneapi::bfloat16`, the member functions of the
`atomic_ref` class may be emulated, and may use a different floating-point
environment to that used by device kernels.


== Implementation notes

Native instruction support for 16-bit integer atomics is sufficient to
implement the `sycl::atomic_ref` specializations for `sycl::half` and
`sycl::ext::oneapi::bfloat16`, because all atomic arithmetic operations can be
implemented in terms of atomic load, store and compare-exchange instructions.

== Issues

None.