On Tue, Apr 18, 2017 at 3:15 PM, Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote: > > > On 18/04/17 03:36 PM, Dan Williams wrote: >> On Tue, Apr 18, 2017 at 2:22 PM, Jason Gunthorpe >> <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> wrote: >>> On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote: >>>>> I think this opens an even bigger can of worms.. >>>> >>>> No, I don't think it does. You'd only shim when the target page is >>>> backed by a device, not host memory, and you can figure this out by a >>>> is_zone_device_page()-style lookup. >>> >>> The bigger can of worms is how do you meaningfully stack dma_ops. >> >> This goes back to my original comment to make this capability a >> function of the pci bridge itself. The kernel has an implementation of >> a dynamically created bridge device that injects its own dma_ops for >> the devices behind the bridge. See vmd_setup_dma_ops() in >> drivers/pci/host/vmd.c. > > Well the issue I think Jason is pointing out is that the ops don't > stack. The map_* function in the injected dma_ops needs to be able to > call the original map_* for any page that is not p2p memory. This is > especially annoying in the map_sg function which may need to call a > different op based on the contents of the sgl. (And please correct me if > I'm not seeing how this can be done in the vmd example.) Unlike the pci bus address offset case which I think is fundamental to support since shipping archs do this today, I think it is ok to say p2p is restricted to a single sgl that gets to talk to host memory or a single device. That said, what's wrong with a p2p aware map_sg implementation calling up to the host memory map_sg implementation on a per sgl basis? > Also, what happens if p2p pages end up getting passed to a device that > doesn't have the injected dma_ops? This goes back to limiting p2p to a single pci host bridge. If the p2p capability is coordinated with the bridge rather than between the individual devices then we have a central point to catch this case. ...of course this is all hand wavy until someone writes the code and proves otherwise. > However, the concept of replacing the dma_ops for all devices behind a > supporting bridge is interesting and may be a good piece of the final > solution. It's at least a proof point for injecting special behavior for devices behind a (virtual) pci bridge without needing to go touch a bunch of drivers.