On Mon, Nov 19, 2018 at 05:14:05PM +0800, Kenneth Lee wrote: > If the hardware cannot share page table with the CPU, we then need to have > some way to change the device page table. This is what happen in ODP. It > invalidates the page table in device upon mmu_notifier call back. But this cannot > solve the COW problem: if the user process A share a page P with device, and A > forks a new process B, and it continue to write to the page. By COW, the > process B will keep the page P, while A will get a new page P'. But you have > no way to let the device know it should use P' rather than P. Is this true? I thought mmu_notifiers covered all these cases. The mm_notifier for A should fire if B causes the physical address of A's pages to change via COW. And this causes the device page tables to re-synchronize. > In WarpDrive/uacce, we make this simple. If you support IOMMU and it support > SVM/SVA. Everything will be fine just like ODP implicit mode. And you don't need > to write any code for that. Because it has been done by IOMMU framework. If it Looks like the IOMMU code uses mmu_notifier, so it is identical to IB's ODP. The only difference is that IB tends to have the IOMMU page table in the device, not in the CPU. The only case I know if that is different is the new-fangled CAPI stuff where the IOMMU can directly use the CPU's page table and the IOMMU page table (in device or CPU) is eliminated. Anyhow, I don't think a single instance of hardware should justify an entire new subsystem. Subsystems are hard to make and without multiple hardware examples there is no way to expect that it would cover any future use cases. If all your driver needs is to mmap some PCI bar space, route interrupts and do DMA mapping then mediated VFIO is probably a good choice. If it needs to do a bunch of other stuff, not related to PCI bar space, interrupts and DMA mapping (ie special code for compression, crypto, AI, whatever) then you should probably do what Jerome said and make a drivers/char/hisillicon_foo_bar.c that exposes just what your hardware does. If you have networking involved in here then consider RDMA, particularly if this functionality is already part of the same hardware that the hns infiniband driver is servicing. 'computational MRs' are a reasonable approach to a side-car offload of already existing RDMA support. Jason