David Hildenbrand <david@xxxxxxxxxx> writes: > On 03.08.23 14:14, Jason Gunthorpe wrote: >> On Thu, Aug 03, 2023 at 07:35:51AM +0000, Kasireddy, Vivek wrote: >>> Hi Jason, >>> >>>>>> Right, the "the zero pages are changed into writable pages" in your >>>>>> above comment just might not apply, because there won't be any page >>>>>> replacement (hopefully :) ). >>>> >>>>> If the page replacement does not happen when there are new writes to the >>>>> area where the hole previously existed, then would we still get an >>>> invalidate >>>>> when this happens? Is there any other way to get notified when the zeroed >>>>> page is written to if the invalidate does not get triggered? >>>> >>>> What David is saying is that memfd does not use the zero page >>>> optimization for hole punches. Any access to the memory, including >>>> read-only access through hmm_range_fault() will allocate unique >>>> pages. Since there is no zero page and no zero-page replacement there >>>> is no issue with invalidations. >> >>> It looks like even with hmm_range_fault(), the invalidate does not get >>> triggered when the hole is refilled with new pages because of writes. >>> This is probably because hmm_range_fault() does not fault in any pages >>> that get invalidated later when writes occur. >> hmm_range_fault() returns the current content of the VMAs, or it >> faults. If it returns pages then it came from one of these two places. >> If your VMA is incoherent with what you are doing then you have >> bigger >> problems, or maybe you found a bug. Note it will only fault in pages if HMM_PFN_REQ_FAULT is specified. You are setting that however you aren't setting HMM_PFN_REQ_WRITE which is what would trigger a fault to bring in the new pages. Does setting that fix the issue you are seeing? >>> The above log messages are seen immediately after the hole is punched. As >>> you can see, hmm_range_fault() returns the pfns of old pages and not zero >>> pages. And, I see the below messages (with patch #2 in this series applied) >>> as the hole is refilled after writes: >> I don't know what you are doing, but it is something wrong or you've >> found a bug in the memfds. > > > Maybe THP is involved? I recently had to dig that out for an internal > discussion: > > "Currently when truncating shmem file, if the range is partial of THP > (start or end is in the middle of THP), the pages actually will just get > cleared rather than being freed unless the range cover the whole THP. > Even though all the subpages are truncated (randomly or sequentially), > the THP may still be kept in page cache. This might be fine for some > usecases which prefer preserving THP." > > My recollection is that this behavior was never changed. > > https://lore.kernel.org/all/1575420174-19171-1-git-send-email-yang.shi@xxxxxxxxxxxxxxxxx/