Am 08.01.25 um 15:58 schrieb Jason Gunthorpe:
On Wed, Jan 08, 2025 at 02:44:26PM +0100, Christian König wrote:
Having the importer do the mapping is the correct way to operate the
DMA API and the new API that Leon has built to fix the scatterlist
abuse in dmabuf relies on importer mapping as part of it's
construction.
Exactly on that I strongly disagree on.
DMA-buf works by providing DMA addresses the importer can work with and
*NOT* the underlying location of the buffer.
The expectation is that the DMA API will be used to DMA map (most)
things, and the DMA API always works with a physaddr_t/pfn
argument. Basically, everything that is not a private address space
should be supported by improving the DMA API. We are on course for
finally getting all the common cases like P2P and MMIO solved
here. That alone will take care of alot.
Well, from experience the DMA API has failed more often than it actually
worked in the way required by drivers.
Especially that we tried to hide architectural complexity in there
instead of properly expose limitations to drivers is not something I
consider a good design approach.
So I see putting even more into that extremely critical.
For P2P cases we are going toward (PFN + P2P source information) as
input to the DMA API. The additional "P2P source information" provides
a good way for co-operating drivers to represent private address
spaces as well. Both importer and exporter can have full understanding
what is being mapped and do the correct things, safely.
I can say from experience that this is clearly not going to work for all
use cases.
It would mean that we have to pull a massive amount of driver specific
functionality into the DMA API.
Things like programming access windows for PCI BARs is completely driver
specific and as far as I can see can't be part of the DMA API without
things like callbacks.
With that in mind the DMA API would become a mid layer between different
drivers and that is really not something you are suggesting, isn't it?
So, no, we don't loose private address space support when moving to
importer mapping, in fact it works better because the importer gets
more information about what is going on.
Well, sounds like I wasn't able to voice my concern. Let me try again:
We should not give importers information they don't need. Especially not
information about the backing store of buffers.
So that importers get more information about what's going on is a bad thing.
I have imagined a staged approach were DMABUF gets a new API that
works with the new DMA API to do importer mapping with "P2P source
information" and a gradual conversion.
To make it clear as maintainer of that subsystem I would reject such a
step with all I have.
We have already gone down that road and it didn't worked at all and was
a really big pain to pull people back from it.
Exporter mapping falls down in too many cases already:
1) Private addresses spaces don't work fully well because many devices
need some indication what address space is being used and scatter list
can't really properly convey that. If the DMABUF has a mixture of CPU
and private it becomes a PITA
Correct, yes. That's why I said that scatterlist was a bad choice for
the interface.
But exposing the backing store to importers and then let them do
whatever they want with it sounds like an even worse idea.
2) Multi-path PCI can require the importer to make mapping decisions
unique to the device and program device specific information for the
multi-path. We are doing this in mlx5 today and have hacks because
DMABUF is destroying the information the importer needs to choose the
correct PCI path.
That's why the exporter gets the struct device of the importer so that
it can plan how those accesses are made. Where exactly is the problem
with that?
When you have an use case which is not covered by the existing DMA-buf
interfaces then please voice that to me and other maintainers instead of
implementing some hack.
3) Importing devices need to know if they are working with PCI P2P
addresses during mapping because they need to do things like turn on
ATS on their DMA. As for multi-path we have the same hacks inside mlx5
today that assume DMABUFs are always P2P because we cannot determine
if things are P2P or not after being DMA mapped.
Why would you need ATS on PCI P2P and not for system memory accesses?
4) TPH bits needs to be programmed into the importer device but are
derived based on the NUMA topology of the DMA target. The importer has
no idea what the DMA target actually was because the exporter mapping
destroyed that information.
Yeah, but again that is completely intentional.
I assume you mean TLP processing hints when you say TPH and those should
be part of the DMA addresses provided by the exporter.
That an importer tries to look behind the curtain and determines the
NUMA placement and topology themselves is clearly a no-go from the
design perspective.
5) iommufd and kvm are both using CPU addresses without DMA. No
exporter mapping is possible
We have customers using both KVM and XEN with DMA-buf, so I can clearly
confirm that this isn't true.
Regards,
Christian.
Jason