Hi Jason, > > > Right, the "the zero pages are changed into writable pages" in your > > > above comment just might not apply, because there won't be any page > > > replacement (hopefully :) ). > > > If the page replacement does not happen when there are new writes to the > > area where the hole previously existed, then would we still get an > invalidate > > when this happens? Is there any other way to get notified when the zeroed > > page is written to if the invalidate does not get triggered? > > What David is saying is that memfd does not use the zero page > optimization for hole punches. Any access to the memory, including > read-only access through hmm_range_fault() will allocate unique > pages. Since there is no zero page and no zero-page replacement there > is no issue with invalidations. It looks like even with hmm_range_fault(), the invalidate does not get triggered when the hole is refilled with new pages because of writes. This is probably because hmm_range_fault() does not fault in any pages that get invalidated later when writes occur. Not sure if there is a way to request it to fill a hole with zero pages. Here is what I have in the invalidate callback (added on top of this series): static bool invalidate_udmabuf(struct mmu_interval_notifier *mn, const struct mmu_notifier_range *range_mn, unsigned long cur_seq) { struct udmabuf_vma_range *range = container_of(mn, struct udmabuf_vma_range, range_mn); struct udmabuf *ubuf = range->ubuf; struct hmm_range hrange = {0}; unsigned long *pfns, num_pages, timeout; int i, ret; printk("invalidate; start = %lu, end = %lu\n", range->start, range->end); hrange.notifier = mn; hrange.default_flags = HMM_PFN_REQ_FAULT; hrange.start = max(range_mn->start, range->start); hrange.end = min(range_mn->end, range->end); num_pages = (hrange.end - hrange.start) >> PAGE_SHIFT; pfns = kmalloc_array(num_pages, sizeof(*pfns), GFP_KERNEL); if (!pfns) return true; printk("invalidate; num pages = %lu\n", num_pages); hrange.hmm_pfns = pfns; timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); do { hrange.notifier_seq = mmu_interval_read_begin(mn); mmap_read_lock(ubuf->vmm_mm); ret = hmm_range_fault(&hrange); mmap_read_unlock(ubuf->vmm_mm); if (ret) { if (ret == -EBUSY && !time_after(jiffies, timeout)) continue; break; } if (mmu_interval_read_retry(mn, hrange.notifier_seq)) continue; } while (ret); if (!ret) { for (i = 0; i < num_pages; i++) { printk("hmm returned page = %p; pfn = %lu\n", hmm_pfn_to_page(pfns[i]), pfns[i] & ~HMM_PFN_FLAGS); } } return true; } static const struct mmu_interval_notifier_ops udmabuf_invalidate_ops = { .invalidate = invalidate_udmabuf, }; Here are the log messages I see when I run the udmabuf (shmem-based) selftest: [ 132.662863] invalidate; start = 140737347612672, end = 140737347629056 [ 132.672953] invalidate; num pages = 4 [ 132.676690] hmm returned page = 000000000483755d; pfn = 2595360 [ 132.682676] hmm returned page = 00000000d5a87cc6; pfn = 2588133 [ 132.688651] hmm returned page = 00000000f9eb8d20; pfn = 2673429 [ 132.694629] hmm returned page = 000000005b44da27; pfn = 2588481 [ 132.700605] invalidate; start = 140737348661248, end = 140737348677632 [ 132.710672] invalidate; num pages = 4 [ 132.714412] hmm returned page = 0000000002867206; pfn = 2680737 [ 132.720394] hmm returned page = 00000000778a48f0; pfn = 2680738 [ 132.726366] hmm returned page = 00000000d8adf162; pfn = 2680739 [ 132.732350] hmm returned page = 00000000671769ff; pfn = 2680740 The above log messages are seen immediately after the hole is punched. As you can see, hmm_range_fault() returns the pfns of old pages and not zero pages. And, I see the below messages (with patch #2 in this series applied) as the hole is refilled after writes: [ 160.279227] udpate mapping; old page = 000000000483755d; pfn = 2595360 [ 160.285809] update mapping; new page = 00000000080e9595; pfn = 2680991 [ 160.292402] udpate mapping; old page = 00000000d5a87cc6; pfn = 2588133 [ 160.298979] update mapping; new page = 000000000483755d; pfn = 2595360 [ 160.305574] udpate mapping; old page = 00000000f9eb8d20; pfn = 2673429 [ 160.312154] update mapping; new page = 00000000d5a87cc6; pfn = 2588133 [ 160.318744] udpate mapping; old page = 000000005b44da27; pfn = 2588481 [ 160.325320] update mapping; new page = 00000000f9eb8d20; pfn = 2673429 [ 160.333022] udpate mapping; old page = 0000000002867206; pfn = 2680737 [ 160.339603] update mapping; new page = 000000003e2e9628; pfn = 2674703 [ 160.346201] udpate mapping; old page = 00000000778a48f0; pfn = 2680738 [ 160.352789] update mapping; new page = 0000000002867206; pfn = 2680737 [ 160.359394] udpate mapping; old page = 00000000d8adf162; pfn = 2680739 [ 160.365966] update mapping; new page = 00000000778a48f0; pfn = 2680738 [ 160.372552] udpate mapping; old page = 00000000671769ff; pfn = 2680740 [ 160.379131] update mapping; new page = 00000000d8adf162; pfn = 2680739 FYI, I ran this experiment with the kernel (6.5.0 RC1) from drm-tip. Thanks, Vivek > > Jason