On Thu, 8 Apr 2021, Axel Rasmussen wrote: > On Tue, Apr 6, 2021 at 4:49 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Mon, Apr 05, 2021 at 10:19:17AM -0700, Axel Rasmussen wrote: ... > > > --- a/mm/userfaultfd.c > > > +++ b/mm/userfaultfd.c ... > > > + > > > + if (is_file_backed) { > > > + /* The shmem MAP_PRIVATE case requires checking the i_size */ > > > + inode = dst_vma->vm_file->f_inode; > > > + offset = linear_page_index(dst_vma, dst_addr); > > > + max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); > > > + ret = -EFAULT; > > > + if (unlikely(offset >= max_off)) > > > + goto out_unlock; > > > > Frankly I don't really know why this must be put into pgtable lock.. Since if > > not required then it can be moved into UFFDIO_COPY path, as CONTINUE doesn't > > need it iiuc. Just raise it up as a pure question. > > It's not clear to me either. shmem_getpage_gfp() does check this twice > kinda like we're doing, but it doesn't ever touch the PTL. What it > seems to be worried about is, what happens if a concurrent > FALLOC_FL_PUNCH_HOLE happens somewhere in the middle of whatever > manipulation we're doing? From looking at shmem_fallocate(), I think > the basic point is that truncation happens while "inode_lock(inode)" > is held, but neither shmem_mcopy_atomic_pte() or the new > mcopy_atomic_install_ptes() take that lock. > > I'm a bit hesitant to just remove it, run some tests, and then declare > victory, because it seems plausible it's there to catch some > semi-hard-to-induce race. I'm not sure how to prove that *isn't* > needed, so my inclination is to just keep it? > > I'll send a series addressing the feedback so far this afternoon, and > I'll leave this alone for now - at least, it doesn't seem to hurt > anything. Maybe Hugh or someone else has some more advice about it. If > so, I'm happy to remove it in a follow-up. It takes some thinking about, but the i_size check is required to be under the pagetable lock, for the MAP_PRIVATE UFFDIO_COPY path, where it is inserting an anonymous page into the file-backed vma (skipping actually inserting a page into page cache, as an ordinary fault would). Not because of FALLOC_FL_PUNCH_HOLE (which makes no change to i_size; and it's okay if a race fills in the hole immediately afterwards), but because of truncation (which must remove all beyond i_size). In the MAP_SHARED case, with a locked page inserted into page cache, the page lock is enough to exclude concurrent truncation. But even in that case the second i_size check (I'm looking at 5.12-rc's shmem_mfill_atomic_pte(), rather than recent patches which might differ) is required: because the first i_size check was done before the page became visible in page cache, so a concurrent truncation could miss it). Maybe that first check is redundant, though I'm definitely for doing it; or maybe shmem_add_to_page_cache() would be better if it made that check itself, under xas_lock (I think the reason it does not is historical). The second check, in the MAP_SHARED case, does not need to be under pagetable lock - the page lock on the page cache page is enough - but probably Andrea placed it there to resemble the anonymous case. You might then question, how come there is no i_size check in all of mm/memory.c, where ordinary faulting is handled. I'll answer that the pte_same() check, under pagetable lock in wp_page_copy(), is where the equivalent to userfaultfd's MAP_PRIVATE UFFDIO_COPY check is made: if the page cache page has already been truncated, that pte will have been cleared. Or, if the page cache page is truncated an instant after wp_page_copy() drops page table lock, then the unmap_mapping_range(,,, even_cows = 1) which follows truncation has to clean it up. Er, does that mean that the i_size check I started off insisting is required, actually is not required? Um, maybe, but let's just keep it and say goodnight! Hugh