# sycl_ext_oneapi_bindless_images

:source-highlighter: coderay
:coderay-linenums-mode: table
:dpcpp: pass:[DPC++]

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en

:blank: pass:[ +]
// Set the default source code type in this document to C++,
// for syntax highlighting purposes.  This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}


== Notice

[%hardbreaks]
Copyright (C) Codeplay. All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc.  OpenCL(TM) is a trademark of Apple Inc. used by
permission by Khronos.


== Dependencies

This extension is written against the SYCL 2020 revision 6 specification.  All
references below to the "core SYCL specification" or to section numbers in the
SYCL specification refer to that revision.

== Status

This is an experimental extension specification, intended to provide early
access to features and gather community feedback.  Interfaces defined in this
specification are implemented in {dpcpp}, but they are not finalized and may
change incompatibly in future versions of {dpcpp} without prior notice.
*Shipping software products should not rely on APIs defined in this
specification.*


== Backend support status

This extension is currently implemented in {dpcpp} only for GPU devices and
only when using the CUDA backend.  Attempting to use this extension in
kernels that run on other devices or backends will not work. 
Be aware that the compiler may not be able to issue a diagnostic to
warn you if this happens.

== Overview

Images in SYCL 1.2.1 were designed to work with OpenCL. SYCL 2020 tried to make 
them more versatile by splitting the image type into sampled and unsampled 
images. This enabled SYCL 2020 images to work better with other backends.
However, SYCL 2020 images still didn't quite meet user expectations.
There was feedback about various use cases where the current model falls short
(see examples at the end of this document for some of the use cases).

One of the key issues is requesting access to arbitrary images through handles, 
and not accessors. Accessing images through handles instead of accessors grants 
much more flexibility to the user, at the expense of automatic data dependency 
tracking. Bypassing accessors allows users to implement programs where the 
number of images is not known at compile-time, such as a texture atlas where one 
image holds references to other images. This kind of feature is impossible to 
implement with the accessor model outlined in the core specification.

These shortcomings are why we propose a new extension for SYCL 2020 images.
Per our proposal, users would be able to separate memory allocation for the 
image from the actual image creation. Images will be represented by opaque 
handle types that can be passed directly into a kernel without requesting 
access. In many ways, this model more closely resembles the USM model when 
accessing data on the device, but it's specialized for dealing with images.

The proposed model does not replace SYCL 2020 images,
it is instead meant as building blocks for implementing SYCL 2020 images on 
top of it.

In addition to bindless images, this document also proposes an interoperability 
extension providing functionality to allow users to import external memory and 
semaphore objects from other APIs, such as Vulkan or DirectX. 

Importing memory allows it to be shared between APIs without the need to 
duplicate allocations and perform multiple copies between host and device to 
ensure that said memory is kept uniform across those APIs at all times.

Importing semaphores will also allow SYCL to schedule command groups and queue 
operations that depend on completion of GPU commands submitted by external APIs.

[NOTE]
====
The interoperability outlined in this document concerns only the importing of 
external API objects into the SYCL runtime. We do not expose exportation of SYCL 
objects to external APIs. Interoperability capabilities vary between APIs. For 
example, CUDA allows the import of external memory and semaphores, but does not 
allow export of its own resources.
====

== Specification

=== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification. An implementation supporting this extension must predefine the
macro `SYCL_EXT_ONEAPI_BINDLESS_IMAGES` to one of the values defined in the
table below. Applications can test for the existence of this macro to
determine if the implementation supports this feature, or applications can test
the macro's value to determine which of the extension's features the
implementation supports.

[frame="none",options="header"]
|======================
|Rev | Description
|1   | Initial draft of the proposal
|2   | Second revision of the proposal
|3   | Third revision of the proposal
|4   | Fourth revision of the proposal
|5   | Fifth revision of the proposal
|6   | Sixth revision of the proposal
|======================

See the revision history at the bottom of this document for features added in 
each revision.

=== Querying bindless image support

We provide the following device queries to retrieve information on whether a 
SYCL implementation provides support for various bindless image features.

The device aspects for these queries are:

[frame="none",options="header"]
|======================
|Device descriptor |Description
|`aspect::ext_oneapi_bindless_images` | Indicates if the device supports 
bindless images. This includes creating bindless images backed by the
`image_mem` and `image_mem_handle` APIs.
|`aspect::ext_oneapi_bindless_images_shared_usm` | Indicates if the device 
supports the creation of bindless images backed by shared USM memory.
|`aspect::ext_oneapi_bindless_images_1d_usm` | Indicates if the device supports 
creation of 1D bindless images backed by USM.
|`aspect::ext_oneapi_bindless_images_2d_usm` | Indicates if the device supports 
creation of 2D bindless images backed by USM.
|======================

[NOTE]
====
Not all SYCL backends may provide support for bindless images constructed from 
USM memory with all dimensions. As an example, CUDA does not have 
native support for 3D image resources constructed from USM. In the future, some
backends may support this, and this proposal may be updated to allow creation 
of 3D USM images.
====

=== Image descriptor

```cpp
namespace sycl::ext::oneapi::experimental {

enum class image_channel_type : /* unspecified */ {
  snorm_int8,
  snorm_int16,
  unorm_int8,
  unorm_int16,
  signed_int8,
  signed_int16,
  signed_int32,
  unsigned_int8,
  unsigned_int16,
  unsigned_int32,
  fp16,
  fp32,
};

enum class image_type : /* unspecified */ {
  standard,
  mipmap,
  array,
  cubemap,
};

struct image_descriptor {
  size_t width{0};
  size_t height{0};
  size_t depth{0};
  unsigned int num_channels{4};
  image_channel_type channel_type{image_channel_type::fp32};
  image_type type{image_type::standard};
  unsigned int num_levels{1};
  unsigned int array_size{1};

  image_descriptor() = default;

  image_descriptor(sycl::range<1> dims, unsigned int num_channels,
                   image_channel_type channel_type, 
                   image_type type = image_type::standard, 
                   unsigned int num_levels = 1, unsigned int array_size = 1);

  image_descriptor(sycl::range<2> dims, unsigned int num_channels,
                   image_channel_type channel_type, 
                   image_type type = image_type::standard, 
                   unsigned int num_levels = 1, unsigned int array_size = 1);

  image_descriptor(sycl::range<3> dims, unsigned int num_channels,
                   image_channel_type channel_type, 
                   image_type type = image_type::standard, 
                   unsigned int num_levels = 1, unsigned int array_size = 1);

  image_descriptor get_mip_level_desc(unsigned int level) const;

  void verify() const;
};

}
```

The image descriptor represents the image dimensions, number of channels, and
channel type. An `image_type` member is also present to allow for implementation
of mipmapped, image array, and cubemapped images.

The `image_descriptor` shall be default constructible and follow by-value 
semantics.

[NOTE]
====
Additional future `image_type`s _may_ include combined image types like 
"mipmapped cubemap".
====

Note that `image_channel_type` and `image_channel_order` existed in SYCL 1.2.1,
but were removed in SYCL 2020 in favor of a single, unified enum class.
We propose separating them again to enable better flexibility
and to avoid combinatorial complexity.

The `verify` member function is available to check the validity of the image
descriptor against the limitations outlined below. If the given descriptor is 
deemed invalid, then a `sycl::exception` will be thrown with error code 
`sycl::errc::invalid`.

The value of `num_channels` supported by all image types and backends
is  `1`, `2`, or `4`.
Some backends also support `num_channels` to be `3`.

For the `standard` image type, the value of `num_levels` and `array_size` must
both be `1`.

The `type` member will inform the implementation of the type of image to 
create, allocate, or free.

Only mipmap image types support more than one level. For mipmap images, the 
member function `get_mip_level_desc` will return an `image_descriptor` for 
a given level of a mipmap, with valid dimension values for that level, and the 
type of the returned `image_descriptor` will be `image_type::standard`.

Only array image types support more than one array layer.

=== Allocating image memory

The process of creating an image is two-fold:
allocate an image's memory, then create an image handle from the allocation.
Allocation of image memory can be achieved in two ways. 

==== Allocating non-USM image memory 

```cpp
namespace sycl::ext::oneapi::experimental {

struct image_mem_handle {
  using raw_handle_type = /* implementation defined */;
  raw_handle_type raw_handle;
}

class image_mem {
public:
  image_mem();
  image_mem(const image_mem &rhs);
  image_mem(image_mem &&rhs) noexcept;

  image_mem(const image_descriptor &imageDesc,
            const sycl::device &syclDevice, 
            const sycl::context &syclContext);
  image_mem(const image_descriptor &imageDesc,
            const sycl::queue &syclQueue);

  ~image_mem();

  image_mem &operator=(image_mem &rhs);
  image_mem &operator=(image_mem &&rhs) noexcept;

  bool operator==(const image_mem &rhs) const;
  bool operator!=(const image_mem &rhs) const;

  image_mem_handle get_handle() const;
  image_descriptor get_descriptor() const;
  sycl::device get_device() const;
  sycl::context get_context() const;

  sycl::range<3> get_range() const;
  sycl::image_channel_type get_channel_type() const;
  unsigned int get_num_channels() const;
  image_type get_type() const;

  image_mem_handle get_mip_level_mem_handle(unsigned int level) const;
};

image_mem_handle alloc_image_mem(const image_descriptor &imageDesc,
                                 const sycl::device &syclDevice,
                                 const sycl::context &syclContext);
image_mem_handle alloc_image_mem(const image_descriptor &imageDesc,
                                 const sycl::queue &syclQueue);

void free_image_mem(image_mem_handle memHandle,
                    image_type imageType,
                    const sycl::device &syclDevice,
                    const sycl::context &syclContext);
void free_image_mem(image_mem_handle memHandle,
                    image_type imageType,
                    const sycl::queue &syclQueue);
}
```

The first method of allocating device memory for images is through 
`alloc_image_mem`. This takes a `sycl::device`, `sycl::context`, 
and `image_descriptor` to allocate device memory, with the appropriate image 
type and size based on the `image_descriptor`. Alternatively, we can also pass a 
`sycl::queue` instead of both `sycl::device` and `sycl::context`.

Memory allocated in this way requires the user to free that memory after all 
operations using the memory are completed and no more operations operating on 
the memory will be scheduled. This is done using `free_image_mem`. An 
`image_type` should be passed to `free_image_mem` to inform the implementation 
of the type of memory to be freed.

The second method involves the `image_mem` class, which is a RAII class wrapper
that performs allocation and deallocation of device memory.

The default constructor does not allocate any memory on the device and the 
resulting `image_mem` object is in an uninitialized state.

the constructor is a wrapper for `alloc_image_mem` functionality.
The destructor is a wrapper for `free_image_mem` functionality.

`image_mem` also provides some functions to get various properties of the 
image memory allocation such as the image range, channel type, channel order, 
number of channels, number of levels, and image type.

In the case where a mipmap has been allocated, `get_mip_level_mem_handle` can 
be used to return an `image_mem_handle` to a specific level of the mipmap. This 
can then be used to copy data to that specific level or create an image handle 
based on that level.

Note that the handle type `image_mem_handle::raw_handle_type` is an opaque type, 
and the handle cannot be dereferenced on the host. The layout of the memory is 
backend-specific, and may be an optimized layout, e.g. tile swizzle patterns.

The `image_mem` class must follow Common Reference Semantics as outlined by the 
core SYCL 2020 specification. 

The `image_mem` class is not a valid kernel argument.

If the construction of the `image_mem` class fails, a 
`sycl::exception` with error code `sycl::errc::memory_allocation` will be 
thrown.

Similarly, if `alloc_image_mem` or `free_image_mem` fail, a `sycl::exception` 
with error code `sycl::errc::memory_allocation` will be thrown.

[NOTE]
====
In the DPC++ CUDA backend, `image_mem` will allocate/deallocate a 
`CUarray` type (or `CUmipmappedArray` in the case of mipmap images).
====

===== Getting image information from `image_mem_handle`

Extension functions are provided to retrieve information about images allocated 
using the `image_mem_alloc` function. These are similar to the member functions 
provided by `image_mem`. However, since the `image_mem_handle` is a minimal 
struct representing just the opaque handle the underlying memory object, there 
is some information that we cannot retrieve from it, namely the `image_type`,
`num_channels`, the `sycl::context` or `sycl::device` the memory was
allocated in, and the `image_descriptor` used to allocate the memory.

```cpp
namespace sycl::ext::oneapi {

sycl::range<3> get_image_range(const image_mem_handle memHandle,
                               const sycl::device &syclDevice,
                               const sycl::context &syclContext);
sycl::range<3> get_image_range(const image_mem_handle memHandle,
                               const sycl::queue &syclQueue);

sycl::image_channel_type
get_image_channel_type(const image_mem_handle memHandle,
                       const sycl::device &syclDevice,
                       const sycl::context &syclContext);
sycl::image_channel_type
get_image_channel_type(const image_mem_handle memHandle,
                       const sycl::queue &syclQueue);

unsigned int get_image_num_channels(const image_mem_handle memHandle,
                                    const sycl::device &syclDevice,
                                    const sycl::context &syclContext);
unsigned int get_image_num_channels(const image_mem_handle memHandle,
                                    const sycl::queue &syclQueue);

image_mem_handle get_mip_level_mem_handle(const image_mem_handle mipMemHandle,
                                          unsigned int level, 
                                          const sycl::device &syclDevice,
                                          const sycl::context &syclContext);
image_mem_handle get_mip_level_mem_handle(const image_mem_handle mipMemHandle,
                                          unsigned int level,
                                          const sycl::queue &syclQueue);
}
```

For `get_image_range` where the underlying image memory was allocated with one 
or two dimensions, the returned `sycl::range<3>` will contain zero values for 
the dimensions unused by the underlying image memory object.

==== Allocating USM image memory 

The second way to allocate image memory is to use USM allocations. SYCL already 
provides a number of USM allocation functions. This proposal would add another,
pitched memory allocation, through `pitched_alloc_device`.

Bindless images can be backed by device, host, or shared USM memory allocations.

[NOTE]
====
Image memory backed by USM device and host allocations is generally supported,
whereas shared USM allocations depend on the SYCL backend as well as the device
capabilities.
====

```cpp
namespace sycl::ext::oneapi::experimental {

void *pitched_alloc_device(size_t *retRowPitch, 
                           size_t widthInBytes, size_t height,
                           unsigned int elementSizeBytes, 
                           const sycl::queue &syclQueue);
void *pitched_alloc_device(size_t *retRowPitch
                           size_t widthInBytes, size_t height,
                           unsigned int elementSizeBytes, 
                           const sycl::device &syclDevice, 
                           const sycl::context &syclContext);

void *pitched_alloc_device(size_t *resultPitch,
                           const image_descriptor &desc,
                           const sycl::queue &queue);

void *pitched_alloc_device(size_t *resultPitch,
                           const image_descriptor &desc,
                           const sycl::device &syclDevice,
                           const sycl::context &syclContext);
}
```

This function will allocate a memory region aimed to be used for 
two-dimensional images. It allocates memory that is guaranteed to 
adhere to the device's alignment requirements for USM images.

If the user does not wish to use `pitched_alloc_device` to allocate 
two-dimensional USM images, but prefers to use another USM allocation 
function instead, then that allocation must adhere to some alignment 
restrictions. These restrictions are device specific, and queries for them can 
be found in the "Pitch alignment restrictions and queries" section below.

If the allocation of pitched memory fails, `pitched_alloc_device` will throw a 
`sycl::exception` with error code `sycl::errc::memory_allocation`.

=== Pitch alignment restrictions and queries

For the purposes of this document, the row pitch of an image memory allocation 
is the distance in bytes between the first elements of adjacent rows of the 
image. Some devices may require two-dimensional USM images to be allocated with 
specific alignments for their width and pitch values. The `pitched_alloc_device`
API intends to make allocation of USM memory adhering to these restrictions 
easy, returning the appropriate pitch value to the user. However, if a user 
wishes to use another USM allocation function, they must be aware of these 
restrictions, and query the device to ensure the allocations they wish to use 
adhere to those restrictions.

This proposal provides a number of additional device queries that enable the 
user to allocate appropriate pitched USM memory for two-dimensional 
images. One-dimensional images do not require any pitch values.

The device information descriptors for these queries are:

[frame="none",options="header"]
|======================
|Device descriptor |Return type |Description
|`ext::oneapi::experimental::info::device::image_row_pitch_align` |`uint32_t` | 
Returns the required alignment of the pitch between two rows of an image in 
bytes for images allocated using USM.
|`ext::oneapi::experimental::info::device::max_image_linear_width` |`size_t` | 
Returns the maximum linear width allowed for images allocated using USM.
|`ext::oneapi::experimental::info::device::max_image_linear_height` |`size_t` | 
Returns the maximum linear height allowed for images allocated using USM.
|`ext::oneapi::experimental::info::device::max_image_linear_row_pitch` 
|`size_t` | Returns the maximum linear row pitch allowed for images allocated 
using USM.
|======================

=== Obtaining a handle to the image

The next step is to create the image, and obtain the handle.

```cpp
namespace sycl::ext::oneapi::experimental {

/// Opaque unsampled image handle type.
struct unsampled_image_handle {
  using raw_image_handle_type = /* Implementation defined */;

  unsampled_image_handle();
  unsampled_image_handle(raw_image_handle_type raw_handle);

  raw_image_handle_type raw_handle;
};

/// Opaque sampled image handle type.
struct sampled_image_handle {
  using raw_image_handle_type = /* Implementation defined */;

  sampled_image_handle();
  sampled_image_handle(raw_image_handle_type raw_image_handle);

  raw_image_handle_type raw_handle;
};

// Creating an unsampled image from an `image_mem_handle`
unsampled_image_handle create_image(image_mem_handle memHandle,
                                    const image_descriptor &desc,
                                    const sycl::device &syclDevice,
                                    const sycl::context &syclContext);
unsampled_image_handle create_image(image_mem_handle memHandle,
                                    const image_descriptor &desc,
                                    const sycl::queue &syclQueue);

// Creating a sampled image from an `image_mem_handle`
sampled_image_handle create_image(image_mem_handle memHandle,
                                  const image_descriptor &desc,
                                  const bindless_image_sampler &sampler,
                                  const sycl::device &syclDevice,
                                  const sycl::context &syclContext);
sampled_image_handle create_image(image_mem_handle memHandle,
                                  const image_descriptor &desc,
                                  const bindless_image_sampler &sampler,
                                  const sycl::queue &syclQueue);

// Creating an unsampled image from an `image_mem` object
unsampled_image_handle create_image(const image_mem &memHandle,
                                    const image_descriptor &desc,
                                    const sycl::device &syclDevice,
                                    const sycl::context &syclContext);
unsampled_image_handle create_image(const image_mem &memHandle,
                                    const image_descriptor &desc,
                                    const sycl::queue &syclQueue);

// Creating a sampled image from an `image_mem` object
sampled_image_handle create_image(const image_mem &memHandle,
                                  const image_descriptor &desc,
                                  const bindless_image_sampler &sampler,
                                  const sycl::device &syclDevice,
                                  const sycl::context &syclContext);
sampled_image_handle create_image(const image_mem &memHandle,
                                  const image_descriptor &desc,
                                  const bindless_image_sampler &sampler,
                                  const sycl::queue &syclQueue);

// Creating a sampled image from a USM allocation and pitch
sampled_image_handle create_image(const void *usmPtr, size_t pitch,
                                  const image_descriptor &desc,
                                  const bindless_image_sampler &sampler,
                                  const sycl::device &syclDevice,
                                  const sycl::context &syclContext);
sampled_image_handle create_image(const void *usmPtr, size_t pitch,
                                  const image_descriptor &desc,
                                  const bindless_image_sampler &sampler,
                                  const sycl::queue &syclQueue);

// Destroying an image handle
void destroy_image_handle(sampled_image_handle &imageHandle,
                          const sycl::device &syclDevice,
                          const sycl::context &syclContext);
void destroy_image_handle(sampled_image_handle &imageHandle,
                          const sycl::queue &syclQueue);

void destroy_image_handle(unsampled_image_handle &imageHandle,
                          const sycl::device &syclDevice,
                          const sycl::context &syclContext);
void destroy_image_handle(unsampled_image_handle &imageHandle,
                          const sycl::queue &syclQueue);
}
```

Once we have allocated memory, we can pass it into the `create_image` function
to obtain a `sampled_image_handle` or `unsampled_image_handle`.
These objects are opaque types that represent an image object.
They can be captured by value into a SYCL kernel, or they can be passed in a 
buffer as a dynamic array of images (see examples at the bottom of this 
document).

We can either provide a `bindless_image_sampler` (defined in section below) or 
not when creating the image. Doing so will create a `sampled_image_handle`, 
where otherwise an `unsampled_image_handle` would be returned. A 
`sampled_image_handle` should contain a raw sampler handle that will be used 
when sampling an image.

Whether an `image_descriptor` or `void *` USM allocation was passed to 
`create_image`, it must have been allocated in the same context and on the same 
device as the one passed to `create_image`.

If we choose to create a 2D image from a USM allocation by passing a `void *`, 
we must also pass the pitch of the memory allocation. If the memory was 
allocated using `pitched_alloc_device`, the pitch passed must be the one which 
was returned by `pitched_alloc_device`. If the user did not use 
`pitched_alloc_device` to allocate this memory, then that memory must still 
adhere to device specific alignment restrictions. These restrictions and their 
queries are outlined in the section "Pitch alignment restrictions and queries" 
below.

The pitch is ignored for 1D USM images.

If the creation of an image fails, `create_image` will throw a `sycl::exception` 
with error code `sycl::errc::runtime`.

The `unsampled_image_handle` and `sampled_image_handle` types shall be 
default-constructible, copy-constructible, and device-copyable. When default 
constructed, image handles are not valid until a user manually assigns a valid 
`raw_image_handle_type` to the `raw_handle` field of the handle struct. The 
default value of the `raw_handle` is implementation defined.

The `unsampled_image_handle` and `sampled_image_handle` types have a 
constructor to allow creation of the types from a `raw_image_handle_type`

[NOTE]
====
In the DPC++ CUDA backend a sampled image will correspond to a CUDA texture, 
whereas an unsampled image will correspond to a CUDA surface.
====

After we're done with the image, we need to destroy the handle using 
`destroy_image_handle`. Destroying an image handle does not deallocate the 
underlying image memory. The user is responsible for deallocation, either 
through `free_image_mem`, or destroying the `image_mem` object, if one was used.

=== Image sampler struct

The `bindless_image_sampler` struct shown below is used to set the sampling 
properties of `sampled_images` upon image creation. It can be used to set 
sampling properties that exist in the SYCL 2020 `image_sampler` as well as 
extra properties used for sampling additional image types including 
level-of-detail (LOD) and anisotropic filtering for mipmaps, and seamless 
filtering for cubemaps.

```cpp
namespace sycl::ext::oneapi::experimental {

enum class cubemap_filtering_mode : /* unspecified */ {
  disjointed,
  seamless,
};

struct bindless_image_sampler {

  // Assign addressing mode to all dimensions
  bindless_image_sampler(sycl::addressing_mode addressing,
                         sycl::coordinate_normalization_mode coordinate,
                         sycl::filtering_mode filtering);

  bindless_image_sampler(sycl::addressing_mode addressing,
                         sycl::coordinate_normalization_mode coordinate,
                         sycl::filtering_mode filtering,
                         sycl::filtering_mode mipFiltering,
                         float minMipmapLevelClamp, float maxMipmapLevelClamp,
                         float maxAnisotropy);

  bindless_image_sampler(sycl::addressing_mode addressing,
                         sycl::coordinate_normalization_mode coordinate,
                         sycl::filtering_mode filtering,
                         cubemap_filtering_mode cubemapFiltering);
                         
  // Specific addressing modes per dimension
  bindless_image_sampler(sycl::addressing_mode addressing[3],
                         sycl::coordinate_normalization_mode coordinate,
                         sycl::filtering_mode filtering);

  bindless_image_sampler(sycl::addressing_mode addressing[3],
                         sycl::coordinate_normalization_mode coordinate,
                         sycl::filtering_mode filtering,
                         sycl::filtering_mode mipmapFiltering,
                         float minMipmapLevelClamp, float maxMipmapLevelClamp,
                         float maxAnisotropy);
  
  bindless_image_sampler(sycl::addressing_mode addressing[3],
                         sycl::coordinate_normalization_mode coordinate,
                         sycl::filtering_mode filtering,
                         cubemap_filtering_mode cubemapFiltering);

  sycl::addressing_mode addressing[3] = {sycl::addressing_mode::none};
  sycl::coordinate_normalization_mode coordinate =
      sycl::coordinate_normalization_mode::unnormalized;
  sycl::filtering_mode filtering = sycl::filtering_mode::nearest;
  sycl::filtering_mode mipmap_filtering = sycl::filtering_mode::nearest;
  float min_mipmap_level_clamp = 0.f;
  float max_mipmap_level_clamp = 0.f;
  float max_anisotropy = 0.f;
  ext::oneapi::experimental::cubemap_filtering_mode cubemap_filtering = 
    cubemap_filtering_mode::disjointed;
};

}
```

The `bindless_image_sampler` shall be default constructible and follow by-value 
semantics. The value for the addressing mode, `addressing_mode::none`, 
represents the backend's default addressing mode. On CUDA this is `Wrap`, i.e. 
`addressing_mode::repeat`.

We propose renaming `addressing_mode::clamp` to
`addressing_mode::clamp_to_border`. This name aligns better with terms used in
other APIs and is more descriptive as to what the addressing mode does. Note
that in this extension, the addressing mode will be named 
`addressing_mode::ext_oneapi_clamp_to_border` as to comply with extension naming
guidelines.

`addressing[3]` defines the addressing mode per texture dimension. A 
`bindless_image_sampler` can be constructed with a singular 
`sycl::addressing_mode`, where this parameter will define all dimensions.
Not all devices may support unique addressing per dimension.  We provide device
aspect queries for this in <<querying_unique_addressing_support>>

`mipmap_filtering` dictates the method in which sampling between mipmap 
levels is performed.

`min_mipmap_level_clamp` defines the minimum mipmap level from which we can 
sample, with the minimum value being 0.

`max_mipmap_level_clamp` defines the maximum mipmap level from which we can 
sample. This value cannot be higher than the number of allocated levels.

`max_anisotropy` dictates the anisotropic ratio used when sampling the mipmap 
with anisotropic filtering.

`cubemap_filtering` dictates the method of sampling along cubemap face borders. 
Disjointed indicates no sampling between faces whereas seamless indicates that 
sampling across face boundaries is enabled.

[NOTE]
====
In CUDA, when seamless cubemap filtering is enabled, sampled image address modes 
specified are ignored. Instead, if the `filtering` mode is set to `nearest` the 
address mode `clamp_to_edge` will be applied for all dimensions. If the 
`filtering` mode is set to `linear` then seamless cubemap filtering will be 
performed when sampling along the cube face borders.
====

=== Explicit copies [[explicit_copies]]

```cpp
namespace sycl {

class handler {
public:

  // Simple host to device copy
  void ext_oneapi_copy(
      const void *Src,
      ext::oneapi::experimental::image_mem_handle Dest,
      const ext::oneapi::experimental:image_descriptor &DestImgDesc);

  // Host to device copy with offsets and extent
  void ext_oneapi_copy(
      const void *Src,
      sycl::range<3> SrcOffset,
      sycl::range<3> SrcExtent,
      ext::oneapi::experimental::image_mem_handle Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      sycl::range<3> CopyExtent);

  // Simple device to host copy
  void ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      void *Dest,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc);

  // Device to host copy with offsets and extent
  void ext_oneapi_copy(
    const ext::oneapi::experimental::image_mem_handle Src,
    sycl::range<3> SrcOffset,
    const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
    void *Dest,
    sycl::range<3> DestOffset,
    sycl::range<3> DestExtent,
    sycl::range<3> CopyExtent);

  // Simple HtoD or DtoH copy with USM device memory
  void ext_oneapi_copy(const void *Src,
                       void *Dest,
                       const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
                       size_t DeviceRowPitch);

  // Host to device or device to host copy with USM device memory with offsets
  // and extent
  void ext_oneapi_copy(
    const void *Src,
    sycl::range<3> SrcOffset,
    void *Dest,
    sycl::range<3> DestOffset,
    const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
    size_t DeviceRowPitch,
    sycl::range<3> HostExtent,
    sycl::range<3> CopyExtent);

  // Simple device to device opaque memory to opaque memory copy
  void ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      ext::oneapi::experimental::image_mem_handle Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc);

  // Device to device opaque memory to opaque memory copy with offsets and
  // extent
  void ext_oneapi_copy(
    const ext::oneapi::experimental::image_mem_handle Src,
    sycl::range<3> SrcOffset,
    const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
    ext::oneapi::experimental::image_mem_handle Dest,
    sycl::range<3> DestOffset,
    const ext::oneapi::experimental::image_descriptor &DestImgDesc,
    sycl::range<3> CopyExtent);

  // Simple device to device opaque memory to USM copy
  void ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc, void *Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch);

  // Device to device opaque memory to USM copy with offsets and extent
  void ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc, void *Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      sycl::range<3> CopyExtent);

  // Simple device to device USM to opaque memory copy
  void ext_oneapi_copy(
      const void *Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch, ext::oneapi::experimental::image_mem_handle Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc);

  // Device to device USM to opaque memory copy with offsets and extent
  void ext_oneapi_copy(
      const void *Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch, ext::oneapi::experimental::image_mem_handle Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      sycl::range<3> CopyExtent);

  // Simple device to device or host to host USM to USM copy
  void ext_oneapi_copy(
      const void *Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      void *Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch);

  // Device to device or host to host USM to USM copy with offsets and extent
  void ext_oneapi_copy(
      const void *Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      void *Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      sycl::range<3> CopyExtent);
};

class queue {
public:

  // Simple host to device copy
  event ext_oneapi_copy(const void *Src,
                        ext::oneapi::experimental::image_mem_handle Dest,
                        const ext::oneapi::experimental::image_descriptor &DestImgDesc);
  event ext_oneapi_copy(const void *Src,
                        ext::oneapi::experimental::image_mem_handle Dest,
                        const ext::oneapi::experimental::image_descriptor &DestImgDesc,
                        event DepEvent);
  event ext_oneapi_copy(const void *Src,
                        ext::oneapi::experimental::image_mem_handle Dest,
                        const ext::oneapi::experimental::image_descriptor &DestImgDesc,
                        const std::vector<event> &DepEvents);

  // Host to device copy with offsets and extent
  event ext_oneapi_copy(
    const void *Src,
    range<3> SrcOffset,
    range<3> SrcExtent,
    ext::oneapi::experimental::image_mem_handle Dest,
    range<3> DestOffset,
    const ext::oneapi::experimental::image_descriptor &DestImgDesc,
    range<3> CopyExtent);
  event ext_oneapi_copy(
    const void *Src,
    range<3> SrcOffset,
    range<3> SrcExtent,
    ext::oneapi::experimental::image_mem_handle Dest,
    range<3> DestOffset,
    const ext::oneapi::experimental::image_descriptor &DestImgDesc,
    range<3> Extent, event DepEvent);
  event ext_oneapi_copy(
    const void *Src,
    range<3> SrcOffset,
    range<3> SrcExtent,
    ext::oneapi::experimental::image_mem_handle Dest,
    range<3> DestOffset,
    const ext::oneapi::experimental::image_descriptor &DestImgDesc,
    range<3> CopyExtent, const std::vector<event> &DepEvents);

  // Simple device to host copy
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      void *Dest,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      void *Dest,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      event DepEvent);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      void *Dest,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      const std::vector<event> &DepEvents);

  // Device to host copy with offsets and extent
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      void *Dest,
      range<3> DestOffset,
      range<3> DestExtent,
      range<3> CopyExtent);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      void *Dest,
      range<3> DestOffset,
      range<3> DestExtent,
      range<3> CopyExtent, event DepEvent);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      void *Dest, 
      range<3> DestOffset,
      range<3> DestExtent,
      range<3> CopyExtent, const std::vector<event> &DepEvents);

  // Simple host to device or device to host copy with USM device memory
  event ext_oneapi_copy(
      const void *Src, void *Dest,
      const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
      size_t DeviceRowPitch);
  event ext_oneapi_copy(
      const void *Src, void *Dest,
      const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
      size_t DeviceRowPitch,
      event DepEvent);
  event ext_oneapi_copy(
      const void *Src, void *Dest,
      const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
      size_t DeviceRowPitch,
      const std::vector<event> &DepEvents);

  // Host to device or device to host copy with USM device memory with offsets
  // and extent
  event ext_oneapi_copy(
      const void *Src, sycl::range<3> SrcOffset,
      void *Dest, sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
      size_t DeviceRowPitch,
      sycl::range<3> HostExtent,
      sycl::range<3> CopyExtent);
  event ext_oneapi_copy(
      const void *Src, sycl::range<3> SrcOffset,
      void *Dest, sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
      size_t DeviceRowPitch,
      sycl::range<3> HostExtent,
      sycl::range<3> CopyExtent);
  event ext_oneapi_copy(
      const void *Src, sycl::range<3> SrcOffset,
      void *Dest, sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
      size_t DeviceRowPitch,
      sycl::range<3> HostExtent,
      sycl::range<3> CopyExtent);
  
  // Simple device to device opaque memory to opaque memory copy
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      ext::oneapi::experimental::image_mem_handle Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      ext::oneapi::experimental::image_mem_handle Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      event DepEvent);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      ext::oneapi::experimental::image_mem_handle Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      const std::vector<event> &DepEvents);

  // Device to device opaque memory to opaque memory copy with offsets and
  // extent
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      ext::oneapi::experimental::image_mem_handle Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      sycl::range<3> CopyExtent);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      ext::oneapi::experimental::image_mem_handle Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      sycl::range<3> CopyExtent,
      event DepEvent);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      ext::oneapi::experimental::image_mem_handle Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      sycl::range<3> CopyExtent,
      const std::vector<event> &DepEvents);

  // Simple device to device opaque memory to USM copy
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      void *Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      void *Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      event DepEvent);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      void *Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      const std::vector<event> &DepEvents);

  // Device to device opaque memory to USM copy with offsets and extent
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      void *Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      sycl::range<3> CopyExtent);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      void *Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      sycl::range<3> CopyExtent,
      event DepEvent);
  event ext_oneapi_copy(
      const ext::oneapi::experimental::image_mem_handle Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      void *Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      sycl::range<3> CopyExtent,
      const std::vector<event> &DepEvents);


  // Simple device to device USM to opaque memory copy
  event ext_oneapi_copy(
      const void *Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      ext::oneapi::experimental::image_mem_handle Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc);
  event ext_oneapi_copy(
      const void *Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      ext::oneapi::experimental::image_mem_handle Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      event DepEvent);
  event ext_oneapi_copy(
      const void *Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      ext::oneapi::experimental::image_mem_handle Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      const std::vector<event> &DepEvents);

  // Device to device USM to opaque memory copy with offsets and extent
  event ext_oneapi_copy(
      const void *Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      ext::oneapi::experimental::image_mem_handle Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      sycl::range<3> CopyExtent);
  event ext_oneapi_copy(
      const void *Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      ext::oneapi::experimental::image_mem_handle Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      sycl::range<3> CopyExtent,
      event DepEvent);
  event ext_oneapi_copy(
      const void *Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      ext::oneapi::experimental::image_mem_handle Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      sycl::range<3> CopyExtent,
      const std::vector<event> &DepEvents);

  // Simple device to device or host to host USM to USM copy
  event ext_oneapi_copy(
      const void *Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      void *Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch);
  event ext_oneapi_copy(
      const void *Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      void *Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      event DepEvent);
  event ext_oneapi_copy(
      const void *Src,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      void *Dest,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      const std::vector<event> &DepEvents);

  // Device to device or host to host USM to USM copy with offsets and extent
  event ext_oneapi_copy(
      const void *Src, sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      void *Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      sycl::range<3> CopyExtent);
  event ext_oneapi_copy(
      const void *Src, sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch, void *Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      sycl::range<3> CopyExtent, event DepEvent);
  event ext_oneapi_copy(
      const void *Src,
      sycl::range<3> SrcOffset,
      const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
      size_t SrcRowPitch,
      void *Dest,
      sycl::range<3> DestOffset,
      const ext::oneapi::experimental::image_descriptor &DestImgDesc,
      size_t DestRowPitch,
      sycl::range<3> CopyExtent,
      const std::vector<event> &DepEvents);
};
}
```

To enable the copying of images an `ext_oneapi_copy` function is proposed as a 
method of the queue and handler. It can be used to copy image memory, whether 
allocated through USM or using an `image_mem_handle`, from host to 
device, or device to host. Device to device copies are currently supported only 
through `image_mem_handle` allocations. 
For the `ext_oneapi_copy` variants that do not take 
offsets and extents, the image descriptor passed to the `ext_oneapi_copy` API 
is used to determine the pixel size, dimensions, and extent in memory of the 
image to copy. If performing sub-region copy, the size of the memory region is 
also determined by the offsets and extent passed.

For images allocated using USM, existing SYCL functionality can be used to 
copy their memory, but we also provide `ext_oneapi_copy` functions that take 
USM pointers. If the image memory was allocated using `pitched_alloc_device`, 
then the source and destination, row pitch parameter passed must match that 
which was returned from `pitched_alloc_device`. If the user opted to 
use another allocation function then the device pitch parameters must adhere to 
the alignment restrictions outlined in the 
"Pitch alignment restrictions and queries" section.

Unless performing a sub-region copy, the user must ensure that the memory 
regions accessed through `Dest` and `Src` have the same capacity. 

Whether copying image memory to the device through a USM `Dest` pointer, or an 
`image_mem_handle`, the host memory is always assumed to be tightly packed. 
Similarly, the host memory is assumed to be packed when copying from device to 
host.

For the functions that take an `image_mem_handle`, the handle must have been 
allocated within the same context and device of the `queue`.

For the forms that take a USM pointer, the image memory must also have been 
allocated within the same context and device of the `queue`. The USM memory 
must be accessible on the queue's device.

The `ext_oneapi_copy` function variants that don't take offsets and extents may 
fail in the following scenarios:

1. The `Src` and `Dest` memory was not allocated on the same device and 
context of the queue.

2. The `Src` and `Dest` memory regions, where `Src` or `Dest` can be either 
on the host or device, do not have the same memory capacity, where the capacity 
is calculate from the `width`, `height`, `depth`, `channel_order`, and 
`channel_type` members of the `image_descriptor` parameter.

The `ext_oneapi_copy` function variants that do take offsets and extents may 
fail in the following scenarios:

1. The `Src` and `Dest` memory was not allocated on the same device and 
context of the queue.

2. The image descriptor passed does not match the image descriptor used to 
allocate the image on the device.

3. the `CopyExtent` describes a memory region larger than that which was 
allocated on either the host or the device.

4. The `HostExtent` describes a memory region larger than that which was 
allocated on the host.

5. The `SrcExtent` describes a memory region larger than that which was 
allocated, where `Src` can be either the host or device.

6. The `DestExtent` describes a memory region larger than that which was 
allocated, where `Dest` can be either the host or device.

7. If `SrcOffset + CopyExtent` moves the memory sub-region outside the bounds 
of the memory described by `Src`, irrespective of whether `Src` is on the host 
or the device.

8. If `DestOffset + CopyExtent` moves the memory sub-region outside the bounds 
of the memory described by `Dest`, irrespective of whether `Dest` is on the 
host or the device.

9. The `DeviceRowPitch` does not adhere to the alignment requirements 
outlined in section "Pitch alignment restrictions and queries"

10. The value of `DeviceRowPitch` is smaller than the width of the image on 
the device.

If copying of an image fails, `ext_oneapi_copy` will throw a `sycl::exception` 
with error code `sycl::errc::invalid`, and relay an error message back to the 
user through `sycl::exception::what()`, describing which of the scenarios 
listed above caused the failure.

=== Reading and writing inside the kernel [[reading_writing_inside_kernel]]

```cpp
namespace sycl::ext::oneapi::experimental {

template <typename DataT, typename HintT = DataT, typename CoordT>
DataT fetch_image(const unsampled_image_handle &ImageHandle,
                  const CoordT &Coords);

template <typename DataT, typename HintT = DataT, typename CoordT>
DataT fetch_image(const sampled_image_handle &ImageHandle,
                  const CoordT &Coords);
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT sample_image(const sampled_image_handle &ImageHandle, 
                   const CoordT &Coords);

template <typename DataT, typename CoordT>
void write_image(unsampled_image_handle ImageHandle,
                 const CoordT &Coords, const DataT &Color);
}
```

Inside a kernel, it's possible to retrieve data from an image via `fetch_image` 
or `sample_image`, passing the appropriate image handle. The `fetch_image` API 
is applicable to sampled and unsampled images, and the data will be fetched 
exactly as is in device memory. The `sample_image` API is only applicable to 
sampled images, the image data will be sampled according to the 
`bindless_image_sampler` that was passed to the image upon construction.

When fetching from a sampled image handle, data exactly as is in memory, no 
sampling operations will be performed, and the `bindless_image_sampler` passed 
to the image upon creation has no effect on the returned image data. Note that 
not all devices may support fetching of sampled image data depending on the 
dimension or backing memory type. We provide device aspect queries for this in 
<<querying_sampled_image_fetch_support>>.

Additionally, not all devices may support sampling of USM images. We provide
device aspect queries for this in <<querying_usm_sample_support>>

The user is required to pass a `DataT` template parameter, which specifies the
return type of the `fetch_image` and `sample_image` functions. If `DataT` is 
not a recognized standard type, as defined in <<recognized_standard_types>>, 
and instead a user-defined type, the user must provide a `HintT` template 
parameter to the `fetch_image` and `sample_image` functions, to allow the 
backend to select the correct device intrinsic to fetch or sample their data.

`HintT` must be one of the the <<recognized_standard_types>>, and must be the 
same size as `DataT`.
If `DataT` is a recognized standard type, and `HintT` is also passed, `HintT` 
will be ignored.

When fetching or sampling an image backed by a normalized integer channel type, 
either `DataT` must be a 32-bit or 16-bit floating point value, a `sycl::vec` 
of 32-bit or 16-bit floating point values, or, in the case `DataT` is not one 
of the above, then `HintT` must be one of the above, and be of the same size as 
`DataT`.

It's possible to write to an unsampled image via `write_image` passing the 
handle of the image to be written to, along with the coordinates to write to and 
the data. User-defined types are allowed to be written provided that type is 
trivially copyable. The user defined type must also be of the same size as any 
of the <<recognized_standard_types>>.

Sampled images cannot be written to using `write_image`.

For fetching and writing of unsampled images, coordinates are specified by 
`int`, `sycl::vec<int, 2>`, and `sycl::vec<int, 3>` for 1D, 2D, and 3D images, 
respectively.

Sampled image "sampled reads" take `float`, `sycl::vec<float, 2>`, and 
`sycl::vec<float, 3>` coordinate types for 1D, 2D, and 3D images, respectively.

Sampled image "fetch reads" take `int`, `sycl::vec<int, 2>`, and 
`sycl::vec<int, 3>` coordinate types for 1D, 2D, and 3D images, respectively.

Note also that all images must be used in either read-only or write-only fashion 
within a single kernel invocation; read/write images are not supported.

Note also that read-after-write functionality is not supported. Unsampled 
images may be read from and written back to within the same kernel, however, 
reading from that same image again will result in undefined behaviour. A new 
kernel must be submitted for the written data to be accessible.

[NOTE]
====
Attempting to sample a standard sampled image with `sample_mipmap` or any other 
defined sampling function will result in undefined behaviour.
====

=== Recognized standard types [[recognized_standard_types]]

For the purposes of this extension, the following are classified as recognized 
standard types.

* All POD types (`char`, `short`, `int`, `float`, etc.) excluding `double`
* `sycl::half`
* Variants of `sycl::vec<T, N>` where `T` is one of the above, and `N` is `1`, 
  `2`, or `3`

Any other types are classified as user-defined types.

==== User-defined types

Some examples of a user-defined types may be:

```c++
struct my_float4 {
  float r, g, b, a;
};

struct my_short2 {
  short r, g;
};
```

When providing the above types as `DataT` parameters to an image read function, 
the corresponding `HintT` parameters to use would be `sycl::vec<float, 4>` and
`sycl::vec<short, 2>`, respectively.

=== Querying sampled image fetch support [[querying_sampled_image_fetch_support]]

We provide the following device queries to query support for sampled image 
fetch with various backing memory types and dimensionalities.

The device aspect descriptors for these queries are:

[frame="none",options="header"]
|======================
|Device descriptor | Description
|`aspect::ext_oneapi_bindless_sampled_image_fetch_1d_usm` | 
 Indicates if the device is capable of fetching USM backed 1D 
 sampled image data.
|`aspect::ext_oneapi_bindless_sampled_image_fetch_1d` | 
 Indicates if the device is capable of fetching non-USM backed 1D 
 sampled image data.
|`aspect::ext_oneapi_bindless_sampled_image_fetch_2d_usm` | 
 Indicates if the device is capable of fetching USM backed 2D 
 sampled image data.
|`aspect::ext_oneapi_bindless_sampled_image_fetch_2d` | 
 Indicates if the device is capable of fetching non-USM backed 2D 
 sampled image data.
|`aspect::ext_oneapi_bindless_sampled_image_fetch_3d` | 
 Indicates if the device is capable of fetching non-USM backed 3D 
 sampled image data.
|======================

=== Querying USM sampling support [[querying_usm_sample_support]]

We provide the following device queries to query support for sampling USM
images.

The device aspect descriptors for these queries are:

[frame="none",options="header"]
|======================
|Device descriptor | Description
|`aspect::ext_oneapi_bindless_images_sample_1d_usm` | Indicates if the device
supports the sampling of 1D bindless images backed by USM.
|`aspect::ext_oneapi_bindless_images_sample_2d_usm` | Indicates if the device
supports the sampling of 2D bindless images backed by USM.
|======================

=== Querying unique addressing support [[querying_unique_addressing_support]]

We provide the following device queries to query support information for
unique addressing for each image dimension.

The device aspect descriptor for this query is:

[frame="none",options="header"]
|======================
|Device descriptor |Description
|`aspect::ext_oneapi_unique_addressing_per_dim` | Indicates if the device 
supports unique addressing per dimension when sampling.
|======================

== Mipmapped images

So far, we have described how to create and operate on standard bindless images.
Another type of image we propose support for is a mipmapped image. Mipmapped 
images are an image type with multiple levels. Each consecutive dimension of a 
mipmapped image level is smaller than the previous level. The dimensions of a 
succeeding mip level is half that of the preceding level. As an example, a 
two-dimensional mipmapped image where the top-most level (`level==0`) image has 
a `width==16` and `height==16`, the succeeding level (`level==1`) in the mipmap 
will have sizes `width==8` and `height==8`. This pattern continues until either 
the final level has sizes of `width==1` and `height==1`, or the user-specified 
maximum mip level has been reached (described by the `num_levels` member of 
`image_descriptor`).

=== Querying mipmap support

We provide the following device queries to retrieve information on a SYCL 
implementation of various mipmap features.

The device aspect descriptors for these queries are:

[frame="none",options="header"]
|======================
|Device descriptor |Description
|`aspect::ext_oneapi_mipmap` | Indicates if the device supports allocating 
mipmap resources.
|`aspect::ext_oneapi_mipmap_anisotropy` | Indicates if the device supports 
sampling mipmap images with anisotropic filtering
|`aspect::ext_oneapi_mipmap_level_reference` | Indicates if the device supports 
using images created from individual mipmap levels
|======================

The device information descriptors for these queries are:

[frame="none",options="header"]
|======================
|Device descriptor |Return type |Description
|`ext::oneapi::experimental::info::device::mipmap_max_anisotropy` |`float` |
Return the maximum anisotropic ratio supported by the device
|======================

=== Allocation of mipmapped images

Mipmaps are allocated in a similar manner to standard images, however, mipmaps 
do not support USM backed memory.

Mipmap memory is allocated through `alloc_image_mem`. The user should populate 
the `image_descriptor` with the image type of `image_type::mipmap`, and provide 
the number of mipmaps levels they wish to allocate. The value of `num_levels` 
must be greater than `1`.

Mipmap memory allocated this way requires the user to free that memory after all 
operations using the memory are completed and no more operations operating on 
the memory will be scheduled. This is done using `free_image_mem`, passing 
`image_type::mipmap`. Importantly, individual levels of a mipmap must not be 
freed before calling `free_image_mem`.

The RAII class `image_mem` may also be used to perform allocation and 
deallocation of mipmap device memory. The constructor and destructor act as a 
wrapper for the functions `alloc_image_mem` and `free_image_mem` respectively.

When the underlying memory of `image_mem` is a mipmap, 
`get_mip_level_mem_handle` can be used to return an `image_mem_handle` to a 
specific level of the mipmap. This can then be used to copy data to that 
specific level or create an image based on that level. 

=== Obtaining a handle to a mipmap image

A handle to a mipmap image is acquired in the same way as a 
`sampled_image_handle`. Mipmaps can only be sampled image types. We can create a 
`sampled_image_handle` to the allocated mipmap through the `create_image` 
functions which take a `bindless_image_sampler`. To sample a mipmap correctly, 
the mipmap attributes of this sampler must be defined.

Attempting to create an `unsampled_image_handle` to a mipmap will result in a 
`sycl::exception` with error code `sycl::errc::runtime` being thrown.

=== Copying mipmap image data

In order to copy to or from mipmaps, the user should retrieve an individual 
level's `image_mem_handle` through `image_mem::get_mip_level_mem_handle`, which 
can then be passed to `ext_oneapi_copy`. The user must ensure that the image 
descriptor passed to `ext_oneapi_copy` is dimensioned correctly for the mip 
level being copied to/from. The provided `image_descriptor::get_mip_level_desc` 
allows the user to retrieve a correctly dimensioned image descriptor for any 
level of a given top-level descriptor.

=== Reading a mipmap

Inside the kernel, it's possible to sample a mipmap via `sample_mipmap`, 
passing the `sampled_image_handle`, the coordinates, and either the level or 
anisotropic gradient values.

The method of sampling a mipmap is different based on which `sample_mipmap` 
function is used, and the sampler attributes passed upon creation of the 
mipmap.

```c++
// Nearest/linear filtering between mip levels
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT sample_mipmap(const sampled_image_handle &ImageHandle,
                    const CoordT &Coords,
                    const float Level);

// Anisotropic filtering
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT sample_mipmap(const sampled_image_handle &ImageHandle,
                    const CoordT &Coords,
                    const CoordT &Dx, const CoordT &Dy);
```

Reading a mipmap follows the same restrictions on what coordinate types may be 
used as laid out in <<reading_writing_inside_kernel>>, and the viewing gradients 
are bound to the same type as used for the coordinates.

Reading a mipmap by providing a user-defined return `DataT` type also follows 
the restrictions as laid out in <<reading_writing_inside_kernel>>.

[NOTE]
====
Attempting to sample a mipmap with `sample_image` or any other defined sample 
function will result in undefined behaviour.
====

== Image arrays

Another type of image we propose support for is image arrays. Image arrays are 
images made up of multiple array indices where each index is itself an image and
every index has the same dimensionality, size, and data type.

Image arrays may also be referred to as layered images, and the array indices 
may be referred to layers.

=== Querying image array support

We provide the following device aspect to retrieve support information for
image arrays.

The device aspect descriptor for this query is:

[frame="none",options="header"]
|======================
|Device descriptor |Description
|`aspect::ext_oneapi_image_array` | Indicates if the device supports
image arrays.
|======================

=== Allocation of image arrays

Image arrays are allocated in a similar manner to standard images.

Image array memory is allocated through `alloc_image_mem`. The user should 
populate the `image_descriptor` with the image type of `image_type::array`, 
and provide the size of the array they wish to allocate. The value of 
`array_size` must be greater than `1`.

Image array memory allocated this way requires the user to free that memory 
after all operations using the memory are completed and no more operations 
operating on the memory will be scheduled. This is done using `free_image_mem`, 
passing `image_type::array`.

The RAII class `image_mem` may also be used to perform allocation and 
deallocation of arrayed image device memory. The constructor and destructor act 
as a wrapper for the functions `alloc_image_mem` and `free_image_mem` 
respectively.

[NOTE]
====
Currently there is no support for image arrays backed by USM.
====

=== Obtaining a handle to an image array

A handle to an image array is acquired in the same way as a `standard` image for 
both an `unsampled_image_handle` and `sampled_image_handle`. We create the 
handle for an image array through the appropriate `create_image` functions which 
take the `image_descriptor` and `bindless_image_sampler` for a 
`sampled_image_handle`, or just the `image_descriptor` for an 
`unsampled_image_handle`. 

As with allocation, the descriptor must be populated appropriately, i.e. 
`image_type::array`  and `array_size` is greater than `1`. 

=== Copying image array data [[copying_image_array_data]]

When copying to or from image arrays, the user should utilize `ext_oneapi_copy`
and pass the image arrays' `image_mem_handle`, and any applicable sub-region
copy parameters, as outlined in <<explicit_copies>>.

In order to copy to specific layers of an image array, the offset and extent
parameters involved in sub-region copies must be populated such that the 3rd
dimension of the ranges represent the arrays' layer(s) being copied, regardless
of whether the copy is performed on a 1D or 2D image array.

=== Reading an image array

Inside the kernel, it's possible to retrieve data from an image array via the 
following APIs which all take an image handle, the coordinates to retrieve from, 
as well as the array layer index to retrieve from.

```cpp
namespace sycl::ext::oneapi::experimental {

// Fetch an unsampled image array.
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT fetch_image_array(const unsampled_image_handle &ImageHandle,
                        const CoordT &Coords, unsigned int arrayLayer);

// Fetch a sampled image array.
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT fetch_image_array(const sampled_image_handle &ImageHandle,
                        const CoordT &Coords, unsigned int arrayLayer);

// Sample a sampled image array.
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT sample_image_array(const sampled_image_handle &ImageHandle, 
                         const CoordT &Coords, unsigned int arrayLayer);
}
```

These functions follow the same template restrictions as with retrieving data 
from standard image types, as laid out in <<reading_writing_inside_kernel>>. In
addition, the provided `arrayLayer` must be a valid zero-indexed value within 
the bounds of the associated `image_descriptor::array_size`, otherwise behaviour
is undefined.

When sampling an image array, the sampling is done only within array layers and 
not across layers.

[NOTE]
====
Attempting to fetch or sample from an image array with any other defined 
functions, including those for standard and cubemapped images, will result in 
undefined behaviour.
====

=== Writing an image array

Inside the kernel, it's possible to write to an image array via 
`write_image_array`, passing the `unsampled_image_handle`, the coordinates, the 
array index, and the data to write. User-defined types are allowed to be written 
provided that type is trivially copyable.

```c++
// Write to an unsampled image array
template <typename DataT, typename CoordT>
DataT write_image_array(unsampled_image_handle ImageHandle,
                        const CoordT &Coords, unsigned int ArrayLayer
                        const DataT &Color);
```

Writing to an image array follows the same restrictions on what coordinate types 
may be used as laid out in <<reading_writing_inside_kernel>>.

[NOTE]
====
Attempting to write to an image array with `write_image` or any other defined 
write function will result in undefined behaviour.
====

== Cubemapped images

Another image type this extension supports is cubemapped images. Cubemap images 
are a specialisation of 2D image arrays that have exactly six layers 
representing the faces of a cube where the width and height of each layer (cube 
face) are equal. Cube mapping is a method of environment mapping, where the 
environment is projected onto the sides of the cube. Cubemaps have been applied 
in graphical systems such as skylight illumination, dynamic reflection, and 
skyboxes.

=== Querying cubemap support

We provide the following device aspects to retrieve support information on a 
SYCL implementation of just a couple of cubemap features.

The device aspect descriptors for these queries are:

[frame="none",options="header"]
|======================
|Device descriptor |Description
|`aspect::ext_oneapi_cubemap` | Indicates if the device supports allocating 
and accessing cubemap resources
|`aspect::ext_oneapi_cubemap_seamless_filtering` | Indicates if the device 
supports sampling cubemapped images across face bounderies
|======================


=== Allocation of cubemapped images

As with all other image types, cubemap memory is allocated through 
`alloc_image_mem` with the appropriately populated `image_descriptor`, where 
`width` and `height` are equal, and the type is set to `image_type::cubemap`. 
Since cubemaps are specialised image arrays, the `array_size` must be populated 
with the only valid value, 6. Overriding this with any other value for 
`array_size` could result in an exception or undefined behaviour. Cubemaps are 
not supported with USM backed memory.

Cubemap memory allocated this way requires the user to free that memory after 
all operations using the memory are completed and no more operations operating 
on the memory will be scheduled. This is done using `free_image_mem`, passing 
`image_type::cubemap`. 

The RAII class `image_mem` may also be used to perform allocation and 
deallocation of cubemapped device memory. The constructor and destructor act as 
a wrapper for the functions `alloc_image_mem` and `free_image_mem` respectively.

=== Obtaining a handle to a cubemap

A handle to a cubemap is acquired in the same way as a `standard` image for both 
an `unsampled_image_handle` and `sampled_image_handle`. We create the handle for 
a cubemap through the appropriate `create_image` functions which take the 
`image_descriptor` and `bindless_image_sampler` for a `sampled_image_handle`, or 
just the `image_descriptor` for an `unsampled_image_handle`. 

As with allocation, the descriptor must be populated appropriately, i.e. 
`image_type::cubemap`, `width` and `height` are equal, and `array_size` is equal 
to 6. To sample a cubemap as expected, the cubemap sampling attribute of the 
sampler, namely `seamless_filtering_mode`, must be defined.

=== Copying cubemap image data

In order to copy to or from cubemaps, the user should utilise the provided 
`ext_oneapi_copy` functions following the details laid out in 
<<copying_image_array_data>>. 

=== Reading, writing, and sampling a cubemap

Cubemaps are supported as both unsampled and sampled images, however, the 
meaning of their usage is quite different. 

An unsampled cubemap is treated as an image array with six layers, i.e. an 
integer index denoting a face and two integer coordinates addressing a texel 
within the layer corresponding to this face. Inside the kernel, this is done via 
`fetch_cubemap`, passing the `unsampled_image_handle`, the integer coordinates, 
`int2`, and an integer index denoting the face, `int`. Being an unsampled image, 
a cubemap can be written with `write_cubemap`, passing the 
`unsampled_image_handle`, the integer coordinates, `int2`, and an integer index 
denoting the face, `int`.

On the other hand, a sampled cubemap is addressed using three floating-point 
coordinates `x`, `y`, and `z` that are interpreted as a direction vector 
emanating from the centre of the cube and pointing to one face of the cube and a 
texel within the layer corresponding to that face. Inside the kernel, this is 
done via `sample_cubemap`, passing the `sampled_image_handle`, the 
floating-point coordinates `x`, `y`, and `z`, as a `float3`. The method of 
sampling depends on the sampler attributes passed upon creation of the cubemap.

```c++
// Unsampled cubemap read 
template <typename DataT, typename HintT = DataT>
DataT fetch_cubemap(const unsampled_image_handle &ImageHandle,
                    const int2 &Coords,
                    int Face);

// Sampled cubemap read
template <typename DataT, typename HintT = DataT>
DataT sample_cubemap(const sampled_image_handle &ImageHandle,
                     const float3 &Vec);

// Unsampled cubemap write
template <typename DataT>
void write_cubemap(unsampled_image_handle ImageHandle,
                   const int2 &Coords,
                   int Face, 
                   const DataT &Color);
```

[NOTE]
====
Attempting to read or write to a cubemap with any other defined read/write 
function will result in undefined behaviour.
====

== Interoperability

=== Querying interoperability support

We provide the following device queries to retrieve information on whether a 
SYCL implementation provides support for various interoperability features.

The device aspect descriptors for these queries are:

[frame="none",options="header"]
|======================
|Device descriptor |Description
|`aspect::ext_oneapi_external_memory_import` | Indicates if the device supports 
importing external memory resources.
|`aspect::ext_oneapi_external_semaphore_import`` | Indicates if the device 
supports importing external semaphore resources.
|======================


[NOTE]
====
Not all SYCL backends may provide support for importing or exporting native 
memory or semaphore objects. CUDA for example only supports importation of 
external memory and semaphores, but provides no support for their exportation.
====

=== External Resource types

In order to facilitate the importing of a number of different external memory 
and external semaphore handle types, we propose the following resource 
structures.

[NOTE]
====
We only show three examples of external resource handle types here, but the 
`external_mem_descriptor` and `external_semaphore_descriptor` structs, as 
defined in <<importing_external_memory_objects>> and 
<<importing_external_semaphores>>, could be templated by any number of handle 
types, provided that the SYCL implementation provides support for them.
====

```cpp
namespace sycl::ext::oneapi::experimental {

// POSIX file descriptor handle type
struct resource_fd {
  int file_descriptor;
};

// Windows NT handle type
struct resource_win32_handle {
  void *handle;
};

// Windows NT name type
struct resource_win32_name {
  const void *name;
};

}
```

=== Importing external memory objects [[importing_external_memory_objects]]

In order to import a memory object, an external API must provide an appropriate 
handle to that memory. The exact structure and type of this handle can depend on 
the external API, and the operating system the application is running on.

External memory import is facilitated through the following proposed descriptor 
struct.

```cpp
namespace sycl::ext::oneapi::experimental {

// Types of external memory handles
enum class external_mem_handle_type {
  opaque_fd = 0,
  win32_nt_handle = 1,
  win32_nt_dx12_resource = 2,
};

// Descriptor templated on specific resource type
template <typename ResourceType>
struct external_mem_descriptor {
  ResourceType external_resource;
  external_mem_handle_type handle_type;
  size_t size_in_bytes;
};

}
```

The user should create an `external_mem_descriptor` templated on the appropriate 
handle type, `ResourceType`, for their purposes, e.g. `resource_fd` to describe 
a POSIX file descriptor resource on Linux systems, or a `resource_win32_handle` 
for Windows NT resource handles.

The user must populate the `external_mem_descriptor` with the appropriate 
`ResourceType` values, a `handle_type`, and the size of the external memory in 
bytes, before they can then import that memory into SYCL through 
`import_external_memory`. Note that some handle types can only be used in 
combination with certain resource types, for example the `opaque_fd` handle type
is only used on Linux systems and is only compatible with the `resource_fd` 
resource type.

```cpp
namespace sycl::ext::oneapi::experimental {

struct external_mem {
  using raw_handle_type = /* Implementation defined */;
  raw_handle_type raw_handle;
};

template <typename ResourceType>
external_mem import_external_memory(
    external_mem_descriptor<ResourceType> externalMemDescriptor,
    const sycl::device &syclDevice,
    const sycl::context &syclContext);

template <typename ResourceType>
external_mem import_external_memory(
    external_mem_descriptor<ResourceType> externalMemDescriptor,
    const sycl::queue &syclQueue);

image_mem_handle map_external_image_memory(
    external_mem externalMemHandle,
    const image_descriptor &imageDescriptor,
    const sycl::device &syclDevice,
    const sycl::context &syclContext);
image_mem_handle map_external_image_memory(
    external_mem externalMemHandle,
    const image_descriptor &imageDescriptor,
    const sycl::queue &syclQueue);

void *map_external_linear_memory(
    external_mem externalMemHandle,
    uint64_t size, uint64_t offset,
    const sycl::device &syclDevice,
    const sycl::context &syclContext);
void *map_external_linear_memory(
    external_mem externalMemHandle,
    uint64_t size, uint64_t offset,
    const sycl::queue &syclQueue);
}
```

The resulting `external_mem` can then be mapped, where the resulting type 
is an `image_mem_handle` or a `void *`. This can be used to construct images in
the same way as memory allocated through `alloc_image_mem`, 
`pitched_alloc_device`, or another USM allocation method. The `ext_oneapi_copy` 
operations also work with imported memory mapped to `image_mem_handle` and 
`void *` types.

When calling `create_image` with an `image_mem_handle` or `void *` mapped from 
an external memory object, the user must ensure that the image descriptor they 
pass to `create_image` has members that match or map to those of the external 
API. A mismatch between any of the `width`, `height`, `depth`, 
`image_channel_type`, or `num_channels` members will result in undefined 
behavior. Likewise, if the image is mapped to a linear USM (`void *`) region, 
the pitch value passed to `create_image` needs to match the pitch of the image 
as defined by the external API. Note that when external memory is mapped to a 
linear USM region, this is device-side USM, and not accessible on the host.

Additionally, the `image_type` describing the image must match to the image of 
the external API. The current supported importable image types are `standard` 
and `mipmap`. Attempting to import other image types will result in undefined 
behaviour.

Once a user has finished operating on imported memory, they must ensure that 
they destroy the imported memory handle through `release_external_memory`.

`release_external_memory` can only accept `external_mem` objects that were
created through `import_external_memory`.

```cpp
namespace sycl::ext::oneapi::experimental {

void release_external_memory(external_mem externalMem,
                             const sycl::device &syclDevice,
                             const sycl::context &syclContext);
void release_external_memory(external_mem externalMem,
                             const sycl::queue &syclQueue);
}
```

Destroying or freeing any imported memory through `image_mem_free` or 
`sycl::free` will result in undefined behavior.

=== Importing external semaphores [[importing_external_semaphores]]

In addition to proposing importation of external memory resources, we also 
propose importation of synchronization primitives. Just like the sharing of 
memory between APIs described above, any external APIs must provide a valid a 
handle to a valid semaphore resource they wish to share, and just as external 
memory resources handles can take different forms of structure and type 
depending on the API and operating system, so do external semaphore resource 
handles.

It is important to note, that the use of imported external semaphore objects
within SYCL has the restriction in that imported external semaphores can only
be used in conjuction with SYCL queues that have been constructed with the
`property::queue::in_order` property. The semaphore synchronization mechanism
is not supported for the default SYCL out-of-order queues. Use of the semaphore 
synchronization mechanism with SYCL queues which were not constructed with the 
`queue::in_order` property will result in undefined behaviour.

External semaphore import is facilitated through the following proposed 
descriptor struct.

```cpp
namespace sycl::ext::oneapi::experimental {

// Types of external semaphore handles
enum class external_semaphore_handle_type {
  opaque_fd = 0,
  win32_nt_handle = 1,
  win32_nt_dx12_fence = 2,
};

// Descriptor templated on specific resource type
template <typename ResourceType>
struct external_semaphore_descriptor {
  ResourceType external_resource;
  external_semaphore_handle_type handle_type;
};

}
```

The user should create an `external_semaphore_descriptor` templated on the 
appropriate handle type, `ResourceType`, for their purposes, e.g. `resource_fd` 
to describe a POSIX file descriptor resource on Linux systems, or a 
`resource_win32_handle` for Windows NT resource handles.

The user must populate the `external_semaphore_descriptor` with the appropriate 
`ResourceType` values, and `handle_type`, before they can then import that 
semaphore into SYCL through `import_external_semaphore`. Note that some handle 
types can only be used in combination with certain resource types, for example 
the `opaque_fd` handle type is only used on Linux systems and is only 
compatible with the `resource_fd` resource type.

```cpp
namespace sycl::ext::oneapi::experimental {

struct external_semaphore {
  using raw_handle_type = /* Implementation defined */;
  raw_handle_type raw_handle;
};

template <typename ResourceType>
external_semaphore import_external_semaphore(
    external_semaphore_descriptor<ResourceType>
        externalSemaphoreDescriptor,
    const sycl::device &syclDevice,
    const sycl::context &syclContext);

template <typename ResourceType>
external_semaphore import_external_semaphore(
    external_semaphore_descriptor<ResourceType>
        externalSemaphoreDescriptor,
    const sycl::queue &syclQueue);
}
```

The resulting `external_semaphore` can then be used in a SYCL command 
group, to either wait until the semaphore signalled, or signal the semaphore.

If the type of semaphore imported supports setting the state of discrete 
semaphore value (the semaphore type is `win32_nt_dx12_fence`), then the user 
can specify which value the semaphore operation should wait on, or signal.

We propose to extend the SYCL queue and handler classes with semaphore waiting 
and signalling operations.

```cpp
namespace sycl {

class handler {
public:
  void ext_oneapi_wait_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore);

  void ext_oneapi_wait_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      uint64_t wait_value);

  void ext_oneapi_signal_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore);

  void ext_oneapi_signal_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      uint64_t signal_value);
};

class queue {
public:
  event ext_oneapi_wait_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore);
  event ext_oneapi_wait_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      event DepEvent);
  event ext_oneapi_wait_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      const std::vector<event> &DepEvents);

  event ext_oneapi_wait_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      uint64_t wait_value);
  event ext_oneapi_wait_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      uint64_t wait_value, 
      event DepEvent);
  event ext_oneapi_wait_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      uint64_t wait_value, 
      const std::vector<event> &DepEvents);

  event ext_oneapi_signal_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore);
  event ext_oneapi_signal_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      event DepEvent);
  event ext_oneapi_signal_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      const std::vector<event> &DepEvents);

  event ext_oneapi_signal_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      uint64_t signal_value);
  event ext_oneapi_signal_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      uint64_t signal_value,
      event DepEvent);
  event ext_oneapi_signal_external_semaphore(
      ext::oneapi::experimental::external_semaphore
          external_semaphore,
      uint64_t signal_value,
      const std::vector<event> &DepEvents);
};
}
```

The behaviour of waiting on a semaphore will depend on the type of the 
semaphore which was imported.

If the semaphore does not support setting of a discrete state value (the 
semaphore type is not `win32_nt_dx12_fence`), then any operations submitted to 
the queue after a `ext_oneapi_wait_external_semaphore` call will not begin 
until the imported semaphore is in a signalled state. After this, the semaphore 
will be reset to a non-signalled state.

If the semaphore does support setting of a discrete state value (the semaphore 
type is `win32_nt_dx12_fence`), then any operations submitted to the queue 
after a `ext_oneapi_wait_external_semaphore` call will not begin until the 
imported semaphore is in a state greater than or equal to the `wait_value`. The 
state of this type of semaphore will not be altered by the call to 
`ext_oneapi_wait_external_semaphore`.

When `ext_oneapi_signal_external_semaphore` is called, the external semaphore 
will either be set to a signalled state, or the state of the semaphore will be 
set to `signal_value`, depending on the type of semaphore which was imported.
This singalling will be done after all commands submitted to the queue prior to 
the `ext_oneapi_signal_external_semaphore` call complete.

`ext_oneapi_wait_external_semaphore` and `ext_oneapi_signal_external_semaphore` 
are non-blocking, asynchronous operations.

The user must ensure to destroy all SYCL external semaphore objects used to 
access the external semaphore once they are no longer required through 
`release_external_semaphore`.

```cpp
namespace sycl::ext::oneapi::experimental {

void release_external_semaphore(external_semaphore semaphoreHandle,
                                const sycl::device &syclDevice,
                                const sycl::context &syclContext);

void release_external_semaphore(external_semaphore semaphoreHandle,
                                const sycl::queue &syclQueue);

}
```

== Examples

=== 1D image read/write

```cpp
#include <sycl/sycl.hpp>

include::../../../test-e2e/bindless_images/examples/example_1_1D_read_write.cpp[lines=12..-1]
```

=== Reading from a dynamically sized array of 2D images


```cpp
#include <sycl/sycl.hpp>

include::../../../test-e2e/bindless_images/examples/example_2_2D_dynamic_read.cpp[lines=9..-1]
```

=== Reading a 1D mipmap with anisotropic filtering and levels
```cpp
#include <sycl/sycl.hpp>

include::../../../test-e2e/bindless_images/examples/example_3_1D_mipmap_anisotropic_filtering_and_levels.cpp[lines=10..-1]
```

=== 1D image array read/write
```cpp
#include <sycl/sycl.hpp>

include::../../../test-e2e/bindless_images/examples/example_4_1D_array_read_write.cpp[lines=14..-1]
```

=== Sampling a cubemap

```c++
#include <sycl/sycl.hpp>

include::../../../test-e2e/bindless_images/examples/example_5_sample_cubemap.cpp[lines=10..-1]
```

=== Using imported memory and semaphore objects

```c++
#include <sycl/sycl.hpp>

include::../../../test-e2e/bindless_images/examples/example_6_import_memory_and_semaphores.cpp[lines=14..-1]
```

== Implementation notes

The current DPC++ prototype only implements the proposal for the CUDA backend,
however we are actively exploring Level Zero with SPIR-V.
We are looking at other backend as well in order to ensure the extension can 
work across different backends.

== Issues

=== No dependency tracking

Because this extension allows images to work in a USM-like model,
there are similar limitations to using USM for non-images,
mainly the lack of dependency tracking and the need for users to manually 
synchronize operations.

=== Limitations when using USM as image memory

There are dimension specific limitations:

* 1D - Linear interpolation not possible in the CUDA backend.
       A workaround is to allocate 2D pitched memory with a height of 1.
* 2D - There are some alignment restrictions. See the "Pitch alignment 
       restrictions and queries" section, or use `pitched_alloc_device` to 
       allocate 2D USM image memory.
* 3D - No support at the moment. Possible support in non CUDA backends in the 
       future.

=== 3 channel format support

The ability to create an image with 3 channels depends on the backend.
There is currently no way to query a backend whether it supports this feature.
This query should be added in a later revision of the proposal.

=== Not supported yet

These features still need to be handled:

* Level Zero and SPIR-V support

== Revision History

[frame="none",options="header"]
|======================
|Rev |Date |Changes
|1 |2023-02-03 | Initial draft
|2 |2023-02-23 | - Added `image_mem_handle` for image memory allocated with 
                   `allocate_image`

                 - Added ability to create images from USM

                 - Added new way to copy images, removed requirement for copy 
                   direction

                 - Added image memory information getters to reflect 
                   `cuArray3DGetDescriptor` functionality
|3 |2023-03-30 | - Some text clarifications.

                 - Unsampled images can no longer be created from USM.
                 
                 - Added SYCL 1.2.1 `image_channel_order` and 
                   `image_channel_type` structs.

                 - Added `image_type` to enable construction of layered, 
                   mipmap, and cubemap images in the future.

                 - Added device information descriptors for querying pitched 
                   allocation size and alignment requirement.

                 - Added `ext_oneapi_copy` methods for the `sycl::handler`.

                 - `ext_oneapi_copy` functions now take the `Src` as the first 
                   parameter.

                 - Created `image_mem` as a RAII style class.
                
                 - Renamed `allocate_image` to `alloc_image_mem`
                 
                 - `pitched_alloc_device` can now take an `image_descriptor`.

                 - Added interoperability features

                 - Added support to query bindless image and interoperability 
                   capabilities

                 - Added mipmap support
|4 |2023-06-23 | - Added `sycl::device` parameter to multiple functions to 
                   clarify that images must be created and used on the same 
                   device.

                 - Changed naming and order of some parameters to be consistent
                   throughout the proposal and with core SYCL.

                 - Added variants of functions that take a `sycl::queue` 
                   instead of both `sycl::device` and `sycl::context`.

                 - Removed standalone wait and signal semaphore functions. These
                   should always go through the queue or handler methods.

                 - Removed `get_image_handle` and `get_sampler_handle` functions
                   from sampled and unsampled image handle structs. The structs 
                   have public handle members that can be retrieved without 
                   getters.

                 - Made all enum types and values unspecified

                 - Moved support queries to device aspects, improved naming of 
                   queries for better consistency, and moved device info queries 
                   to the experimental namespace.

                 - Added `get_mip_level_desc` member function to 
                   `image_descriptor`

                 - Fixed `get_mip_level_mem_handle` prototype in `image_mem`, 
                   and added a standalone function.

                 - Removed `ext_oneapi_copy` variants that take `image_mem`, 
                   the user should retrieve the raw handle and pass that 
                   themselves.

                 - Removed `ext_oneapi_copy` variants that take a mip level, 
                   the user should retrieve individual mip level image handles
                   themselves and pass that.

                 - Added `ext_oneapi_copy` variants that take offsets and the 
                   extent, to enable sub-region copy.

                 - Created a list of failure scenarios for `ext_oneapi_copy`, 
                   changed the failure error code to `errc::invalid`, and 
                   specified that the implementation should relay the reason 
                   for the failure back to the user.

                 - Added a `bindless_image_sampler` struct.

                 - Specified that `image_mem` must follow Common Reference 
                   Semantics.

                 - Updated code samples.
|4.1|2023-07-21| - Made bindless image sampler member names snake-case
|4.2|2023-08-18| - `write_image` now allows passing of user-defined types
|4.3|2023-09-08| - Clarify how normalized image formats are read
                 - Remove support for packed normalized image formats 
                   (`unorm_short_555`, `unorm_short_565`, `unorm_int_101010`)
|4.4|2023-09-12| - Added overload with `sycl::queue` to standalone functions
|4.5|2023-09-14| - Update wording for allocating images + fix typo
|4.6|2023-09-19| - Clarify restrictions on reading/writing coordinate types
|4.7|2023-10-16| - Introduce `read_mipmap` for mipmap access and clarify reading 
                   restrictions on image types
|4.8|2023-10-25| - Change the name of `map_external_memory_array` to 
                   `map_external_image_memory` to avoid CUDA terminology
|4.9|2023-11-13| - Add that the bindless sampler is default constructible 
                   and follows by-value semantics
|4.10|2023-11-15| - Added constructors for `sampled_image_handle` and 
                    `unsampled_image_handle` structs.
                  - Removed `raw_sampler_handle` member from 
                    `sampled_image_handle` struct. Awaiting LevelZero 
                    and SPIR-V extensions to mature before before deciding 
                    whether a `raw_sampler_handle` member is necessary.
                  - Renamed `image_handle` members in `sampled_image_handle` and
                    `unsampled_image_handle` structs to `raw_handle`.
|5.0|2023-11-21| - Added section "Recognized standard types", to simplify 
                   wording around what types are allowed to be read or written.
                 - Allow `read_image` and `read_mipmap` to return a 
                   user-defined type.
|5.1|2024-01-17| - Added overload for `ext_oneapi_copy` enabling device to device
                   copies using `image_mem_handle`.
|5.1|2023-12-06| - Added unique addressing modes per dimension to the 
                   `bindless_image_sampler`
|5.2|2024-02-14| - Image read and write functions now accept 3-component 
                   coordinates for 3D reads, instead of 4-component coordinates.
|5.3|2024-02-16| - Replace `read_image` and `read_mipmap` APIs in favor of more 
                   descriptive naming, with `fetch_image`, `sample_image`, and
                   `sample_mipmap`.
|5.4|2024-02-23| - Added support for unsampled image arrays.
                 - Creation of unsampled image arrays.
                 - Fetching/writing of unsampled image arrays.
                 - `image_type::array` added to enum.
                 - `array_size` member added to `image_descriptor`.
                 - `image_descriptor::verify()` member function added.
|5.5|2024-02-27| - Update interop with mipmap interop and slight redesign
                 - `interop` removed from `image_type`
|5.6|2024-03-04| - Added cubemap support.
                 - Allocation of cubemaps.
                 - Creation of cubemaps.
                 - Fetching/writing of unsampled cubemaps and sampling cubemaps.
                 - `image_type::cubemap` added to enum.
                 - Cubemap example.
                 - Updated `image_array_write` with non-const handle parameter.
                 - Removed `&` reference qualifier from `write_xxx` handle
                   parameter. 
|5.7|2024-04-09| - Allow fetching of sampled image data through the 
                   `fetch_image` API.
|5.8|2024-05-09| - Add missing cubemap `HintT` template parameter to 
                   `fetch_cubemap` and `sample_cubemap`.
|5.9|2024-05-14| - Default constructor for `image_descriptor`.
|5.10|2024-05-20| - Replaced `channel_order` field in `image_descriptor` with
                   `num_channels`.
                  - Renamed `image_mem` functions `get_image_channel_type()`
                    to `get_channel_type()` and `get_image_num_channels()` to
                    `get_num_channels()`.
                  - Removed `get_channel_order()` function from `image_mem`.
                    This function is redundant since images don't have a notion
                    of channel order, only the channel size. Use
                    `get_num_channels()` instead.
|5.11|2024-05-27| - Added `external_mem_handle_type` and 
                    `external_semaphore_handle_type` enums. These will allow 
                    multiple handle types to be consumed by the same interop API.
                  - Added `handle_type` field to the `external_mem_descriptor`
                    and `external_semaphore_descriptor` structs. This allows
                    multiple handle types to be consumed by the API, such as 
                    file descriptors, Windows NT handles, and other handles in 
                    the future.
                  - Added semaphore operations which can accept values. These
                    are only supported for certain semaphore types 
                    (e.g. `win32_nt_dx12_fence`).
|5.12|2024-06-19| - Add support for sampled image arrays.
                  - Sample image arrays with `sample_image_array` API.
                  - Fetch sampled image arrays with `fetch_image_array` API.
|5.13|2024-07-10| - Added const-qualifiers to `Src` param in `ext_oneapi_copy`
                    funcs.
|5.14|2024-07-17| - Rename `destroy_external_semaphore` to 
                    `release_external_semaphore`.
|5.15|2024-07-19| - Add missing device queries for image arrays, sampling USM 
                   images and unique addressing per dimension.
                 - Remove aspects for semaphore export, memory export and fetch 
                   3D USM images as they are not supported on any platform.
                 - Refine the description of `ext_oneapi_bindless_images` aspect
                   to indicate support for bindless image APIs.
|5.16|2024-07-24| - Renamed interop aspect queries, handles, semaphore wait and 
                    signal functions, by replacing `interop` with `external` for 
                    consistency with other interop related structs/funcs and 
                    3rd party interop API naming.
                  - Removed `handle` keyword from `interop_xxx_handle` to 
                    clear up possible confusion between 3rd party interop 
                    handles and the imported `interop_xxx_handle`.
|5.17|2024-07-30| - Add support for mapping external memory to linear USM using
                    `map_external_linear_memory`.
|6 |2024-08-05 | - Collated all changes since revision 5.
                 - Bumped SYCL_EXT_ONEAPI_BINDLESS_IMAGES to number 6.
|6.1|2024-09-09| - Update for image-array sub-region copy support.
|6.2|2024-09-26| - Added addressing mode `ext_oneapi_clamp_to_border` value,
                   equivalent to `clamp`, to match with external APIs.
|6.3|2024-10-02| - Add support for `image_mem_handle` to `image_mem_handle`
                   sub-region copies.
|6.4|2024-10-15| - Fix bindless spec examples and include examples in bindless
                   spec using asciidoc include.
|6.5|2024-10-22| - Allow 3-channel image formats on some backends.
|6.6|2025-01-20| - Clarify support for the specific types of USM allocations.
|6.7|2025-01-27| - Update `image_mem_handle` to `image_mem_handle` copies to
                   accept two image descriptors.
                 - Add support for USM to `image_mem_handle` copies and 
                   sub-copies.
                 - Add support for `image_mem_handle` to USM copies and
                   sub-copies.
                 - Add support for USM to USM copies and sub-copies.
                 - Add support for host to host copies and sub-copies.
|======================
