① Memory Interop and C++ abstract interfaces#

Intel® DL Streamer provides independent sub-component for zero-copy buffer sharing and memory interop between various frameworks and memory handles on CPU and GPU

  • CPU memory void*

  • FFmpeg AVFrame

  • GStreamer GstBuffer and GstMemory

  • Level-Zero USM pointers

  • OpenCL cl_mem

  • OpenCV cv::Mat

  • OpenCV cv::UMat

  • OpenVINO™ ov::Tensor and ov::RemoteTensor

  • SYCL USM pointers

  • VA-API VASurfaceID

The memory interop sub-component is available via APT installation sudo apt install intel-dlstreamer-cpp and on github.

Note

This sub-component implemented as C++ header-only library. Python bindings for this library coming in next releases.

Why memory interop library?#

Each media and compute framework with accelerators support (GPU, VPU) defines own interfaces for device and context creation, memory allocation and task submission. Most frameworks also expose export/import interfaces to convert memory objects to/from other memory handles:

  • High-level media frameworks (FFmpeg, GStreamer) support conversion to/from low-level media handles (VA-API and DirectX surfaces)

  • Low-level media interfaces (VA-API, DirectX) support conversion to/from OS-specific general-purpose GPU memory handles such as DMA buffers on Linux and NT handles on Windows

  • OpenCL 3.0 recently introduced extension for DMA buffers and NT handles import and export

  • Intel® oneAPI Level Zero support conversion between USM device pointers (accessible on GPU only) and DMA buffers / NT handles

Together these interfaces allow zero-copy memory sharing between media operations submitted via media frameworks and SYCL/OpenCL compute kernels submitted into SYCL/OpenCL queue, assuming media and compute queues created on same physical GPU device.

Despite multiple stages of memory handles conversion (FFmpeg/GStreamer, VA-API/DirectX, DMA/NT, Level-Zero, SYCL), all converted memory handles refer to same physical memory block. Thus writing data into one memory handle makes the data available in all other memory handles, assuming proper synchronization between write and read operations.

Below is reference to some low-level interfaces used by Intel® DL Streamer memory interop sub-components for zero-copy buffer sharing between media frameworks and OpenCL/SYCL

  1. (Linux) VA-API to DMA-BUF

  2. DMA-BUF or NT-Handle to Level-zero

  3. OpenCL extension cl_khr_external_memory

Memory interop in a few lines - using Intel® DL Streamer#

Intel® DL Streamer hides complexity of dealing with low-level interfaces and greatly simplifies memory interop by defining abstract interfaces Tensor and MemoryMapper, and providing header-only implementation of the `Tensor` interface for various frameworks and `MemoryMapper` implementation for all technically feasible zero-copy mappings on CPU and GPU and mappings between CPU and GPU:

digraph {
  node[shape=record,style=filled,fillcolor=lightskyblue1]

  Gst->CPU
  Gst->DMA
  Gst->OpenCL
  Gst->VAAPI
  DMA->USM
  USM->DMA
  DMA->OpenCL
  OpenCL->CPU
  OpenCL->DMA
  CPU->OpenCV
  OpenCL->OpenCV_UMat
  CPU->OpenVino
  OpenCL->OpenVino
  OpenVino->CPU
  VAAPI->OpenVino
  USM->CPU
  DMA->VAAPI
  VAAPI->DMA
  FFmpeg->VAAPI
  FFmpeg->CPU
}

Memory interop diagram#

All memory mappers implemented under unified interface MemoryMapper with TensorPtr or FramePtr as input parameter, but each mapper from framework AAA to framework BBB internally casts input pointer to specific class AAA Tensor / AAA Frame and creates output as specific class BBB Tensor / BBB Frame, see table below for each supported framework/library:

Framework / Library

Native memory object

Class implementing Tensor

Class implementing Frame

CPU (no framework)

void*

CPUTensor

(BaseFrame)

FFmpeg

AVFrame

FFmpegFrame

GStreamer

GstMemory, GstBuffer

GSTTensor

GSTFrame

Level-zero

void*

USMTensor

(BaseFrame)

OpenCL

cl_mem

OpenCLTensor

(BaseFrame)

OpenCV

cv::Mat

OpenCVTensor

(BaseFrame)

OpenCV

cv::UMat

OpenCVUMatTensor

(BaseFrame)

OpenVINO™

ov::Tensor

OpenVINOTensor

OpenVINOFrame

SYCL

void*

SYCLUSMTensor

(BaseFrame)

Application can create Tensor and Frame objects by either passing pre-allocated native memory object to C++ constructor (wrap already allocated object) or passing allocation parameters to C++ constructor (allocate new memory).

Many examples how to allocate memory and create and use memory mappers can be found by searching word mapper in samples and src folders on github source code, for example FFmpeg+DPCPP sample rgb_to_grayscale and almost every C++ element.

There is special mapper MemoryMapperChain implementing unified interface MemoryMapper as arbitrary chain of multiple mappers. As examples, FFmpeg to DPC++/USM is chain of the following mappers

digraph {
  rankdir="LR"
  node[shape=record,style=filled,fillcolor=lightskyblue1]

  USM0[label="USM (Level-zero)"]
  USM1[label="USM (DPC++)"]

  FFmpeg->VAAPI->DMA->USM0->USM1
}

FFmpeg to USM memory mappers chain#

and GStreamer to OpenCV UMat is chain of the following mappers

digraph {
  rankdir="LR"
  node[shape=record,style=filled,fillcolor=lightskyblue1]

  UMat[label="OpenCV cv::UMat"]

  Gst->VAAPI->DMA->OpenCL->UMat
}

Gst to USM memory mappers chain#

Abstract interfaces for C++ elements#

Additionally, this Intel® DL Streamer sub-component defines abstract interfaces Source , Transform and Sink used as base interfaces for all C++ and GStreamer elements. These interfaces take unified pointers to Tensor and Frame objects as input and output parameters in functions read, process, write and allow to easily build chain of multiple operations. See next page C++ elements for details.

How to use in CMake build system#

If application uses Intel® DL Streamer memory interop library and application based on cmake build system, add pkg_check_modules and include_directories statements like below:

pkg_check_modules(DLSTREAMER dl-streamer REQUIRED)
include_directories(${DLSTREAMER_INCLUDE_DIRS})

For each framework involved in memory interop, add corresponding include_directories and link_libraries statements as required/documented by framework. For example if using memory interop with OpenVINO™, cmake file should contain lines like below

find_package(OpenVINO COMPONENTS runtime)
include_directories(${OpenVINO_INCLUDE_DIRS})
link_libraries(openvino::runtime)

Files structure#

Abstract interfaces are defined in the following header files and installed by sudo apt install intel-dlstreamer-cpp under folder /opt/intel/dlstreamer/include/dlstreamer:

include/dlstreamer
├── audio_info.h
├── context.h
├── dictionary.h
├── element.h
├── frame.h
├── frame_info.h
├── image_info.h
├── image_metadata.h
├── memory_mapper_factory.h
├── memory_mapper.h
├── memory_type.h
├── metadata.h
├── sink.h
├── source.h
├── tensor.h
├── tensor_info.h
├── transform.h
└── utils.h

The following header files implement Tensor interface memory objects in various frameworks and MemoryMapper for memory mapping between frameworks. These header files installed under corresponding subfolders of /opt/intel/dlstreamer/include/dlstreamer by same package intel-dlstreamer-cpp:

include/dlstreamer
├── ffmpeg
│   ├── mappers
│   │   └── ffmpeg_to_vaapi.h
│   ├── context.h
│   ├── frame.h
│   └── utils.h
├── gst
│   ├── allocator.h
│   ├── context.h
│   ├── dictionary.h
│   ├── frame_batch.h
│   ├── frame.h
│   ├── mappers
│   │   ├── any_to_gst.h
│   │   ├── gst_to_cpu.h
│   │   ├── gst_to_dma.h
│   │   ├── gst_to_opencl.h
│   │   └── gst_to_vaapi.h
│   ├── metadata
│   │   ├── gva_audio_event_meta.h
│   │   ├── gva_json_meta.h
│   │   └── gva_tensor_meta.h
│   ├── metadata.h
│   ├── plugin.h
│   ├── tensor.h
│   └── utils.h
├── level_zero
│   ├── context.h
│   ├── mappers
│   │   ├── dma_to_usm.h
│   │   └── usm_to_dma.h
│   └── usm_tensor.h
├── opencl
│   ├── context.h
│   ├── mappers
│   │   ├── dma_to_opencl.h
│   │   ├── opencl_to_cpu.h
│   │   └── opencl_to_dma.h
│   ├── tensor.h
│   ├── tensor_ref_counted.h
│   └── utils.h
├── opencv
│   ├── context.h
│   ├── mappers
│   │   └── cpu_to_opencv.h
│   ├── tensor.h
│   └── utils.h
├── opencv_umat
│   ├── context.h
│   ├── mappers
│   │   └── opencl_to_opencv_umat.h
│   ├── tensor.h
│   └── utils.h
├── openvino
│   ├── context.h
│   ├── frame.h
│   ├── mappers
│   │   ├── cpu_to_openvino.h
│   │   ├── opencl_to_openvino.h
│   │   ├── openvino_to_cpu.h
│   │   └── vaapi_to_openvino.h
│   ├── tensor.h
│   └── utils.h
├── sycl
│   ├── context.h
│   ├── mappers
│   │   └── sycl_usm_to_cpu.h
│   └── sycl_usm_tensor.h
└── vaapi
    ├── context.h
    ├── frame_alloc.h
    ├── frame.h
    ├── mappers
    │   ├── dma_to_vaapi.h
    │   └── vaapi_to_dma.h
    ├── tensor.h
    └── utils.h