On Tue, Jan 29, 2019 at 11:02:25PM +0000, Jason Gunthorpe wrote: > On Tue, Jan 29, 2019 at 03:44:00PM -0500, Jerome Glisse wrote: > > > > But this API doesn't seem to offer any control - I thought that > > > control was all coming from the mm/hmm notifiers triggering p2p_unmaps? > > > > The control is within the driver implementation of those callbacks. > > Seems like what you mean by control is 'the exporter gets to choose > the physical address at the instant of map' - which seems reasonable > for GPU. > > > > will only allow p2p map to succeed for objects that have been tagged by the > > userspace in some way ie the userspace application is in control of what > > can be map to peer device. > > I would have thought this means the VMA for the object is created > without the map/unmap ops? Or are GPU objects and VMAs unrelated? GPU object and VMA are unrelated in all open source GPU driver i am somewhat familiar with (AMD, Intel, NVidia). You can create a GPU object and never map it (and thus never have it associated with a vma) and in fact this is very common. For graphic you usualy only have hand full of the hundreds of GPU object your application have mapped. The control for peer to peer can also be a mutable properties of the object ie userspace do ioctl on the GPU driver which create an object; Some times after the object is created the userspace do others ioctl to allow to export the object to another specific device again this result in ioctl to the device driver, those ioctl set flags and update GPU object kernel structure with all the info. In the meantime you have no control on when other driver might call the vma p2p call backs. So you must have register the vma with vm_operations that include the p2p_map and p2p_unmap. Those driver function will check the object kernel structure each time they get call and act accordingly. > > For moving things around after a successful p2p_map yes the exporting > > device have to call for instance zap_vma_ptes() or something > > similar. > > Okay, great, RDMA needs this flow for hotplug - we zap the VMA's when > unplugging the PCI device and we can delay the PCI unplug completion > until all the p2p_unmaps are called... > > But in this case a future p2p_map will have to fail as the BAR no > longer exists. How to handle this? So the comment above the callback (i should write more thorough guideline and documentation) state that export should/(must?) be predictable ie if an importer device calls p2p_map() once on a vma and it does succeed then if the same device calls again p2p_map() on the same vma and if the vma is still valid (ie no unmap or does not correspond to a different object ...) then the p2p_map() should/(must?) succeed. The idea is that the importer would do a first call to p2p_map() when it setup its own object, report failure to userspace if that fails. If it does succeed then we should never have an issue next time we call p2p_map() (after mapping being invalidated by mmu notifier for instance). So it will succeed just like the first call (again assuming the vma is still valid). Idea is that we can only ask exporter to be predictable and still allow them to fail if things are really going bad. > > > I would think that the importing driver can assume the BAR page is > > > kept alive until it calls unmap (presumably triggered by notifiers)? > > > > > > ie the exporting driver sees the BAR page as pinned until unmap. > > > > The intention with this patchset is that it is not pin ie the importer > > device _must_ abide by all mmu notifier invalidations and they can > > happen at anytime. The importing device can however re-p2p_map the > > same range after an invalidation. > > > > I would like to restrict this to importer that can invalidate for > > now because i believe all the first device to use can support the > > invalidation. > > This seems reasonable (and sort of says importers not getting this > from HMM need careful checking), was this in the comment above the > ops? I think i put it in the comment above the ops but in any cases i should write something in documentation with example and thorough guideline. Note that there won't be any mmu notifier to mmap of a device file unless the device driver calls for it or there is a syscall like munmap or mremap or mprotect well any syscall that work on vma. So assuming the application is no doing something stupid, nor the driver. Then the result of p2p_map can stay valid until the importer is done and call p2p_unmap of its own free will. This is what i do expect for this. But for GPU i would still like to allow GPU driver to evict (and thus invalidate importer mapping) to main memory or defragment their BAR address space if the GPU driver feels a pressing need to do so. If we ever want to support full pin then we might have to add a flag so that GPU driver can refuse an importer that wants things pin forever. Cheers, Jérôme