On Thu, Nov 05, 2020 at 06:29:21PM +0100, Christoph Hellwig wrote: > On Thu, Nov 05, 2020 at 01:23:57PM -0400, Jason Gunthorpe wrote: > > But that depends on the calling driver doing this properly, and we > > don't expose an API to get the PCI device of the struct ib_device > > .. how does nvme even work here? > > The PCI p2pdma APIs walk the parent chains of a struct device until > they find a PCI device. And the ib_device eventually ends up there. Hmm. This works for real devices like mlx5, but it means the three SW devices will also resolve to a real PCI device that is not the DMA device. If nvme wants to do something like this it should walk from the ibdev->dma_device, after these patches to make dma_device NULL. eg rxe is like: $ sudo rdma link add rxe0 type rxe netdev eth1 lrwxrwxrwx 1 root root 0 Nov 5 17:34 /sys/class/infiniband/rxe0/device -> ../../../0000:00:09.0/ I think this is a bug, these virtual devices should have NULL parents... > > If we can't get here then why did you add the check to the unmap side? > > Because I added them to the map and unmap side, but forgot to commit > the map side. Mostly to be prepared for the case where we could > end up there. And thinking out loud I actually need to double check > rdmavt if that is true there as well. It certainly is for rxe and > siw as I checked it on a live system. rdmavt parents itself to the HFI/QIB PCI device, so the walk above should also find a real PCI device > > The SW drivers can't handle PCI pages at all, they are going to try to > > memcpy them or something else not __iomem, so we really do need to > > prevent P2P pages going into them. > > Ok, let's prevent it for now. And if someone wants to do it there > they have to do all the work. Yes, that is the safest - just block the SW devices from ever touch P2P pages. Jason