Re: [LSF/MM/BPF TOPIC] Enabling Smart Data Stream Accelerator Support for Linux

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2/3/25 4:13 AM, Jonathan Cameron wrote:
On Fri, 31 Jan 2025 11:53:07 -0600
Wei Huang <wei.huang2@xxxxxxx> wrote:

Hi All,

I want to proposal a talk for the LSFMMBPF conference: Enabling Smart
Data Stream Accelerator (SDXI) Support for Linux.

The smart data stream accelerator (SDXI) is an industry standard [1]
that provides various advanced capabilities, such as offloading DMA
operations, supporting user-space addresses, and offering other advanced
data processing features. With the integration of SDXI into a SoC, DMA
offloading can now be supported across different address spaces. This
talk focuses on a software design which enables comprehensive SDXI
support across multiple software layers in the Linux Kernel. These
interfaces not only facilitate SDXI hardware management but also allow
kernel space subsystems and user space applications to directly own and
control SDXI hardware under the protection of IOMMU.

To illustrate the practical applications of SDXI, Red Hat and AMD
developed a user-space library that leverages the SDXI driver interface,
demonstrating various use cases, such as memory operation offloading, in
both bare-metal and virtual environments.

The prototype device driver [2] and user-space library are available for
testing. We continue to work on the improvement of both components and
plan to upstream the device driver soon.

== DISCUSSION ==
At this conference, we plan to discuss with the community on:

Hi Wei,

Lots of topics and hints at interesting areas, but I'd like to see more
details to understand how this maps to other data moving / reorganizing
accelerators.  Whilst SDXI looks like a good and feature rich spec,
I'm curious what is fundamentally new?  Perhaps it is just the

Compared with existing implementations, I think the following (combined) features are considered interesting:

* An industry open standard that is architecture agnostic
* Support various, including user-mode, address spaces
* Designed with virtualization in mind, easy for passthru and migration
* Easy extension for future functionalities

right time to improve functionality for DMA engines in general.



1) Use Cases
* Linux DMA engine
* Kernel task offloading (e.g., bulk copying)
* QoS and kernel perf integration
* New use cases

All interesting topics across this particular DMA engine and many others.
For new use cases are you planning to bring some, or is this a request
for suggestions?

Both. Some use cases we tested:
* Server as a DMA engine in Linux
* AutoNUMA offloading
* Memory zeroing for large VM memory initialization
* Batching folio copy operations

We do expect more use cases, and want to solicit ideas from the community.



2) User-Space API Interface
* IOCTL proposal

I'm curious on this aspect and how it compares with previous approaches.
Obviously bring some new operators and possibly need to target remote
memory.  However we have existing support for userspace access to accelerators
for crypto, compression etc (and much broader)

We went through a similar process finding a path to support those a few
years ago and ended up with UACCE. (drivers/misc/uacce lots of stuff
under drivers crypto). If there is overlap it would be good to figure
out a path that reduces duplication / complexity of interfacing with
the various userspace projects we all care about.  I won't tell the stories
of pain an redesigns it took to get UACCE upstream, but if you are doing
another new thing, good luck! (+CC some folk more familiar and active
in this space than I am).

Thanks for the pointer. Right now, as a prototype, we don't take UACCE approach. But we did see UACCE can be utilized in this space. This can be part of discussion.


* Security control
* User-space app integration

3) Virtualization Support
* Progress & current status

Good to have some more detail on this in particular.  Is this mostly blocked
on vSVA, IOMMUFD etc progress or is there something new?

It is blocked by vIOMMU support in both kernel and QEMU/KVM. To support SVA inside VMs, we have to present a virtual IOMMU to guest VMs. There are various way of implementing virtual IOMMU. But AMD hardware vIOMMU is supposed to have better performance over emulated vIOMMU (there was a KVM Forum talk by Suravee and me for reference). We have a prototype implementation. Right now Suravee is working on cleaning/finishing up hardware vIOMMU patches for the upstream.


* Challenges

== REFERENCES ==
[1] SDXI 1.0 specification, https://www.snia.org/sdxi
[2] SDXI device driver, https://github.com/AMDESE/linux-sdxi

Thanks,
-Wei

Thanks,

Jonathan







[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux