> On Dec 18, 2021, at 10:42 AM, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Fri, Dec 17, 2021 at 07:38:39PM -0800, Linus Torvalds wrote: >> On Fri, Dec 17, 2021 at 7:30 PM Nadav Amit <namit@xxxxxxxxxx> wrote: >>> >>> In such a case, I do think it makes sense to fail uffd-wp (when >>> page_count() > 1), and in a prototype I am working on I do something >>> like that. >> >> Ack. If uddf-wp finds a page that is pinned, just skip it as not >> write-protectable. >> >> Because some of the pinners might be writing to it, of course - just >> not through the page tables. > > That doesn't address the qemu use case though. The RDMA pin is the > 'coherent r/o pin' we discussed before, which requires that the pages > remain un-write-protected and the HW DMA is read only. > > The VFIO pin will enable dirty page tracking in the system IOMMU so it > gets the same effect from qemu's perspective as the CPU WP is doing. > > In these operations every single page of the guest will be pinned, so > skip it just means userfault fd wp doesn't work at all. > > Qemu needs some solution to be able to dirty track the CPU memory for > migration.. My bad. I misunderstood the scenario. Yes, I guess that you pin the pages early for RDMA registration, which is also something you may do for IO-uring buffers. This would render userfaultfd unusable. I do not see how it can be solved without custom, potentially complicated logic, which the page_count() approach wants to avoid. The only thing I can think of is requiring the pinned regions to be first madvise’d with MADV_DONTFORK and not COW’ing in such case. But this would break existing code though.