Re: [RFC PATCH 01/12] dma-buf: Introduce dma_buf_get_pfn_unlocked() kAPI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 15.01.25 um 16:10 schrieb Jason Gunthorpe:
On Wed, Jan 15, 2025 at 03:30:47PM +0100, Christian König wrote:

Those rules are not something we cam up with because of some limitation
of the DMA-API, but rather from experience working with different device
driver and especially their developers.
I would say it stems from the use of scatter list. You do not have
enough information exchanged between exporter and importer to
implement something sane and correct. At that point being restrictive
is a reasonable path.

Because of scatterlist developers don't have APIs that correctly solve
the problems they want to solve, so of course things get into a mess.

Well I completely agree that scatterlists have many many problems. And at least some of the stuff you note here sounds like a good idea to tackle those problems.

But I'm trying to explain the restrictions and requirements we previously found necessary. And I strongly think that any new approach needs to respect those restrictions as well or otherwise we will just repeat history.

Applying and enforcing those restrictions is absolutely mandatory must
have for extending DMA-buf.
You said to come to the maintainers with the problems, here are the
problems. Your answer is don't use dmabuf.

That doesn't make the problems go away :(

Yeah, that's why I'm desperately trying to understand your use case.

I really don't want to make a dmabuf2 - everyone would have to
implement it, including all the GPU drivers if they want to work with
RDMA. I don't think this makes any sense compared to incrementally
evolving dmabuf with more optional capabilities.
The point is that a dmabuf2 would most likely be rejected as well or
otherwise run into the same issues we have seen before.
You'd need to be much more concrete and technical in your objections
to cause a rejection. "We tried something else before and it didn't
work" won't cut it.

Granted, let me try to improve this.

Here is a real world example of one of the issues we ran into and why CPU mappings of importers are redirected to the exporter.

We have a good bunch of different exporters who track the CPU mappings of their backing store using address_space objects in one way or another and then uses unmap_mapping_range() to invalidate those CPU mappings.

But when importers get the PFNs of the backing store they can look behind the curtain and directly insert this PFN into the CPU page tables.

We had literally tons of cases like this where drivers developers cause access after free issues because the importer created a CPU mappings on their own without the exporter knowing about it.

This is just one example of what we ran into. Additional to that basically the whole synchronization between drivers was overhauled as well because we found that we can't trust importers to always do the right thing.

There is a very simple problem statement here, we need a FD handle for
various kinds of memory, with a lifetime model that fits a couple of
different use cases. The exporter and importer need to understand what
type of memory it is and what rules apply to working with it. The
required importers are more general that just simple PCI DMA.

I feel like this is already exactly DMABUF's mission.

Besides, you have been saying to go do this in TEE or whatever, how is
that any different from dmabuf2?

You can already turn both a TEE allocated buffer as well as a memfd into a DMA-buf. So basically TEE and memfd already provides different interfaces which go beyond what DMA-buf does and allows.

In other words if you want to do things like direct I/O to block or network devices you can mmap() your memfd and do this while at the same time send your memfd as DMA-buf to your GPU, V4L or neural accelerator.

Would this be a way you could work with as well? E.g. you have your separate file descriptor representing the private MMIO which iommufd and KVM uses but you can turn it into a DMA-buf whenever you need to give it to a DMA-buf importer?
 
That sounds more something for the TEE driver instead of anything DMA-buf
should be dealing with.
Has nothing to do with TEE.
Why?
The Linux TEE framework is not used as part of confidential compute.

CC already has guest memfd for holding it's private CPU memory.
Where is that coming from and how it is used?
What do you mean? guest memfd is the result of years of negotiation in
the mm and x86 arch subsystems :( It is used like a normal memfd, and
we now have APIs in KVM and iommufd to directly intake and map from a
memfd. I expect guestmemfd will soon grow some more generic
dmabuf-like lifetime callbacks to avoid pinning - it already has some
KVM specific APIs IIRC.

But it is 100% exclusively focused on CPU memory and nothing else.

I have seen patches for that flying by on mailing lists and have a high level understand of what's supposed to do, but never really looked more deeply into the code.

This is about confidential MMIO memory.
Who is the exporter and who is the importer of the DMA-buf in this use
case?
In this case Xu is exporting MMIO from VFIO and importing to KVM and
iommufd.

So basically a portion of a PCIe BAR is imported into iommufd?

This is also not just about the KVM side, the VM side also has issues
with DMABUF and CC - only co-operating devices can interact with the
VM side "encrypted" memory and there needs to be a negotiation as part
of all buffer setup what the mutual capability is. :\ swiotlb hides
some of this some times, but confidential P2P is currently unsolved.
Yes and it is documented by now how that is supposed to happen with
DMA-buf.
I doubt that. It is complex and not fully solved in the core code
today. Many scenarios do not work correctly, devices don't even exist
yet that can exercise the hard paths. This is a future problem :(

Let's just say that both the ARM guys as well as the GPU people already have some pretty "interesting" ways of doing digital rights management and content protection.

Regards,
Christian.


Jason


[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux