On Sun, Jul 22, 2018 at 01:21:06PM -0700, Linus Torvalds wrote: > On Sat, Jul 21, 2018 at 3:49 PM Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > IOW, Kirill's patch now just boils down to the attached trial patch. > > Side note: we have a few rdma drivers that mess with vm_ops. > > One does so intentionally, see mlx4_ib_vma_open() in > > drivers/infiniband/hw/mlx4/main.c Both drivers should be doing this for identical reasons, the long comment in mlx4/main.c appears to be the rational and expected behavior.. mlx5/main.c also does this. Most likely this entire process should be part of the core support library and not open coded like this in drivers. > and one *may* be intentional but it's not entirely clear: > hns_roce_vma_open() may bve the same issue, but > hns_roce_disassociate_ucontext() just looks pointless in > > drivers/infiniband/hw/hns/hns_roce_main.c All three drivers should have code like this as well, it is triggered during something like a PCI hot unplug, What the drivers are doing is mapping PCI BAR memory in response to mmap(). When the device becomes unplugged then the BAR memory map is to be revoked and replaced with a zero page, so the physical device can be physically un-plugged, or hard reset. (ie access to the BAR memory may now MCE or something) This is probably the strongest reason why these pages need to be protected, as the driver must be able to hunt down all user space mappings of the BAR page and do this zeroing, so "don't copy" and other flags seem necessary with the current vma tracking approach. Would be glad to hear if this algorithm with zap_vma_ptes makes any sense or is totally wrong. However, I believe, at least for mlx4/5, it is actually actively tested and does work in the straightfoward case. > Afaik, they should just use VM_DONTCOPY to not see the fork open > case, VM_DONTEXPAND to fail a size-changing mremap, and set a > vm_ops->split() function that returns an error. This does makes more sense, I certainly would prefer well defined flags over this strange manipulation. If you say this approach will do what is needed I'll ask the driver maintainer to code and test with it.. Jason