Hi Alistair, David, Jason, > >>>>>> Right, the "the zero pages are changed into writable pages" in your > >>>>>> above comment just might not apply, because there won't be any > page > >>>>>> replacement (hopefully :) ). > >>>> > >>>>> If the page replacement does not happen when there are new writes > to the > >>>>> area where the hole previously existed, then would we still get an > >>>> invalidate > >>>>> when this happens? Is there any other way to get notified when the > zeroed > >>>>> page is written to if the invalidate does not get triggered? > >>>> > >>>> What David is saying is that memfd does not use the zero page > >>>> optimization for hole punches. Any access to the memory, including > >>>> read-only access through hmm_range_fault() will allocate unique > >>>> pages. Since there is no zero page and no zero-page replacement there > >>>> is no issue with invalidations. > >> > >>> It looks like even with hmm_range_fault(), the invalidate does not get > >>> triggered when the hole is refilled with new pages because of writes. > >>> This is probably because hmm_range_fault() does not fault in any pages > >>> that get invalidated later when writes occur. > >> hmm_range_fault() returns the current content of the VMAs, or it > >> faults. If it returns pages then it came from one of these two places. > >> If your VMA is incoherent with what you are doing then you have > >> bigger > >> problems, or maybe you found a bug. > > Note it will only fault in pages if HMM_PFN_REQ_FAULT is specified. You > are setting that however you aren't setting HMM_PFN_REQ_WRITE which is > what would trigger a fault to bring in the new pages. Does setting that > fix the issue you are seeing? No, adding HMM_PFN_REQ_WRITE still doesn't help in fixing the issue. Although, I do not have THP enabled (or built-in), shmem does not evict the pages after hole punch as noted in the comment in shmem_fallocate(): if ((u64)unmap_end > (u64)unmap_start) unmap_mapping_range(mapping, unmap_start, 1 + unmap_end - unmap_start, 0); shmem_truncate_range(inode, offset, offset + len - 1); /* No need to unmap again: hole-punching leaves COWed pages */ As a result, the pfn is still valid and the pte is pte_present() and pte_write(). This is the reason why adding in HMM_PFN_REQ_WRITE does not help; because, it fails the below condition in hmm_pte_need_fault(): if ((pfn_req_flags & HMM_PFN_REQ_WRITE) && !(cpu_flags & HMM_PFN_WRITE)) return HMM_NEED_FAULT | HMM_NEED_WRITE_FAULT; If I force it to read-fault or write-fault (by hacking hmm_pte_need_fault()), it gets indefinitely stuck in the do while loop in hmm_range_fault(). AFAIU, unless there is a way to fault-in zero pages (or any scratch pages) after hole punch that get invalidated because of writes, I do not see how using hmm_range_fault() can help with my use-case. Thanks, Vivek > > >>> The above log messages are seen immediately after the hole is punched. > As > >>> you can see, hmm_range_fault() returns the pfns of old pages and not > zero > >>> pages. And, I see the below messages (with patch #2 in this series > applied) > >>> as the hole is refilled after writes: > >> I don't know what you are doing, but it is something wrong or you've > >> found a bug in the memfds. > > > > > > Maybe THP is involved? I recently had to dig that out for an internal > > discussion: > > > > "Currently when truncating shmem file, if the range is partial of THP > > (start or end is in the middle of THP), the pages actually will just get > > cleared rather than being freed unless the range cover the whole THP. > > Even though all the subpages are truncated (randomly or sequentially), > > the THP may still be kept in page cache. This might be fine for some > > usecases which prefer preserving THP." > > > > My recollection is that this behavior was never changed. > > > > https://lore.kernel.org/all/1575420174-19171-1-git-send-email- > yang.shi@xxxxxxxxxxxxxxxxx/