On Thu, Jan 16, 2025 at 04:13:13PM +0100, Christian König wrote: >> But this, fundamentally, is importers creating attachments and then >> *ignoring the lifetime rules of DMABUF*. If you created an attachment, >> got a move and *ignored the move* because you put the PFN in your own >> VMA, then you are not following the attachment lifetime rules! > > Move notify is solely for informing the importer that they need to > re-fresh their DMA mappings and eventually block for ongoing DMA to > end. I feel that it is a bit pedantic to say DMA and CPU are somehow different. The DMABUF API gives you a scatterlist, it is reasonable to say that move invalidates the entire scatterlist, CPU and DMA equally. > This semantics doesn't work well for CPU mappings because you need to > hold the reservation lock to make sure that the information stay valid > and you can't hold a lock while returning from a page fault. Sure, I imagine hooking up a VMA is hard - but that doesn't change my point. The semantics can be reasonable and well defined. > Yeah and exactly that is something we don't want to allow because it > means that every importer need to get things right to prevent exporters > from running into problems. You can make the same argument about the DMA address. We should just get rid of DMABUF entirely because people are going to mis-use it and wrongly implement the invalidation callback. I have no idea why GPU drivers want to implement mmap of dmabuf, that seems to be a uniquely GPU thing. We are not going to be doing stuff like that in KVM and other places. And we can implement the invalidation callback with correct locking. Why should we all be punished because DRM drivers seem to have this weird historical mmap problem? I don't think that is a reasonable way to approach building a general purpose linux kernel API. > Well it's not miss-used, it's just a very bad design decision to let > every importer implement functionality which actually belong into a > single point in the exporter. Well, this is the problem. Sure it may be that importers should not implement mmap - but using the PFN side address is needed for more than just mmap! DMA mapping belongs in the importer, and the new DMA API makes this even more explicit by allowing the importer alot of options to optimize the process of building the HW datastructures. Scatterlist and the enforeced represetation of the DMA list is very inefficient and we are working to get rid of it. It isn't going to be replaced by any sort of list of DMA addresses though. If you really disagree you can try to convince the NVMe people to give up their optimizations the new DMA API allows so DRM can prevent this code-review problem. I also want the same optimizations in RDMA, and I am also not convinced giving them up is a worthwhile tradeoff. > Why would you want to do a dmabuf2 here? Because I need the same kind of common framework. I need to hook VFIO to RDMA as well. I need to fix RDMA to have working P2P in all cases. I need to hook KVM virtual device stuff to iommufd. Someone else need VFIO to hook into DRM. How many different times do I need to implement a buffer sharing lifetime model? No, we should not make a VFIO specific thing, we need a general tool to do this properly and cover all the different use cases. That's "dmabuf2" or whatever you want to call it. There are more than enough use cases to justify doing this. I think this is a bad idea, we do not need two things, we should have dmabuf to handle all the use cases people have, not just DRMs. > I don't mind improving the scatterlist approach in any way possible. > I'm just rejecting things which we already tried and turned out to be a > bad idea. > If you make an interface which gives DMA addresses plus additional > information like address space, access hints etc.. to importers that > would be really welcomed. This is not welcomed, having lists of DMA addresses is inefficient and does not match the direction of the DMA API. We are trying very hard to completely remove the lists of DMA addresses in common fast paths. > But exposing PFNs and letting the importers created their DMA mappings > themselves and making CPU mappings themselves is an absolutely clear > no-go. Again, this is what we must have to support the new DMA API, the KVM and IOMMUFD use cases I mentioned. >> In this case Xu is exporting MMIO from VFIO and importing to KVM and >> iommufd. > > So basically a portion of a PCIe BAR is imported into iommufd? Yeah, and KVM. And RMDA. > Then create an interface between VFIO and KVM/iommufd which allows to > pass data between these two. > We already do this between DMA-buf exporters/importers all the time. > Just don't make it general DMA-buf API. I have no idea what this means. We'd need a new API linked to DMABUF that would be optional and used by this part of the world. As I said above we could protect it with some module namespace so you can keep it out of DRM. If you can agree to that then it seems fine.. > > Someone else had some use case where they wanted to put the VFIO MMIO > > PCIe BAR into a DMABUF and ship it into a GPU driver for > > somethingsomething virtualization but I didn't understand it. > > Yeah, that is already perfectly supported. No, it isn't. Christoph is blocking DMABUF in VFIO because he does not want to scatterlist abuses that dmabuf is doing to proliferate. We already have some ARM systems where the naive way typical DMABUF implementations are setting up P2P does not work. Those systems have PCI offset. Getting this to be "perfectly supported" is why we are working on all these aspects to improve the DMA API and remove the scatterlist abuses. >> In a certain sense CC is a TEE that is built using KVM instead of the >> TEE subsystem. Using KVM and integrating with the MM brings a whole >> set of unique challenges that TEE got to avoid.. > > Please go over those challenges in more detail. I need to get a better > understanding of what's going on here. > E.g. who manages encryption keys, who raises the machine check on > violations etc... TEE broadly has Linux launch a secure world that does some private work. The secure worlds tend to be very limited, they are not really VMs and they don't run full Linux inside CC broadly has the secure world exist at boot and launch Linux and provide services to Linux. The secure world enforces memory isolation on Linux and generates faults on violations. KVM is the gateway to launch new secure worlds and the secure worlds are full VMs with all the device emulation and more. It CC is much more like xen with it's hypervisor and DOM0 concepts.