On Mon, Apr 12, 2021 at 4:17 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Thu, Apr 08, 2021 at 04:43:22PM -0700, Axel Rasmussen wrote: > > +/* > > + * Install PTEs, to map dst_addr (within dst_vma) to page. > > + * > > + * This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed), > > + * whether or not dst_vma is VM_SHARED. It also handles the more general > > + * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file > > + * backed, or not). > > + * > > + * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by > > + * shmem_mcopy_atomic_pte instead. > > + */ > > +static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t *dst_pmd, > > + struct vm_area_struct *dst_vma, > > + unsigned long dst_addr, struct page *page, > > + bool newly_allocated, bool wp_copy) > > +{ > > + int ret; > > + pte_t _dst_pte, *dst_pte; > > + int writable; > > + bool vm_shared = dst_vma->vm_flags & VM_SHARED; > > + spinlock_t *ptl; > > + struct inode *inode; > > + pgoff_t offset, max_off; > > + > > + _dst_pte = mk_pte(page, dst_vma->vm_page_prot); > > + writable = dst_vma->vm_flags & VM_WRITE; > > + /* For private, non-anon we need CoW (don't write to page cache!) */ > > + if (!vma_is_anonymous(dst_vma) && !vm_shared) > > + writable = 0; > > + > > + if (writable || vma_is_anonymous(dst_vma)) > > + _dst_pte = pte_mkdirty(_dst_pte); > > + if (writable) { > > + if (wp_copy) > > + _dst_pte = pte_mkuffd_wp(_dst_pte); > > + else > > + _dst_pte = pte_mkwrite(_dst_pte); > > + } else if (vm_shared) { > > + /* > > + * Since we didn't pte_mkdirty(), mark the page dirty or it > > + * could be freed from under us. We could do this > > + * unconditionally, but doing it only if !writable is faster. > > + */ > > + set_page_dirty(page); > > + } > > + > > + dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); > > + > > + if (vma_is_shmem(dst_vma)) { > > + /* The shmem MAP_PRIVATE case requires checking the i_size */ > > When you start to use this function in the last patch it'll be needed too even > if MAP_SHARED? > > How about directly state the reason of doing this ("serialize against truncate > with the PT lock") instead of commenting about "who will need it"? > > > + inode = dst_vma->vm_file->f_inode; > > + offset = linear_page_index(dst_vma, dst_addr); > > + max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); > > + ret = -EFAULT; > > + if (unlikely(offset >= max_off)) > > + goto out_unlock; > > + } > > [...] > > > +/* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */ > > +static int mcontinue_atomic_pte(struct mm_struct *dst_mm, > > + pmd_t *dst_pmd, > > + struct vm_area_struct *dst_vma, > > + unsigned long dst_addr, > > + bool wp_copy) > > +{ > > + struct inode *inode = file_inode(dst_vma->vm_file); > > + pgoff_t pgoff = linear_page_index(dst_vma, dst_addr); > > + struct page *page; > > + int ret; > > + > > + ret = shmem_getpage(inode, pgoff, &page, SGP_READ); > > SGP_READ looks right, as we don't want page allocation. However I noticed > there's very slight difference when the page was just fallocated: > > /* fallocated page? */ > if (page && !PageUptodate(page)) { > if (sgp != SGP_READ) > goto clear; > unlock_page(page); > put_page(page); > page = NULL; > hindex = index; > } > > I think it won't happen for your case since the page should be uptodate already > (the other thread should check and modify the page before CONTINUE), but still > raise this up, since if the page was allocated it smells better to still > install the fallocated page (do we need to clear the page and SetUptodate)? Sorry for the somewhat rambling thought process: My first thought is, I don't really know what PageUptodate means for shmem pages. If I understand correctly, normally we say PageUptodate() if the in memory data is more recent or equivalent to the on-disk data. But, shmem pages are entirely in memory - they are file backed in name only, in some sense. fallocate() does all sorts of things so the comment to me seems a bit ambiguous, but it seems the implication is that we're worried specifically about the case where the shmem page was recently allocated with fallocate(mode=0)? In that case, do we use !PageUptodate() to denote that the page has been allocated, but its contents are undefined? I suppose that would make sense, as the action "goto clear;" generally memset()-s the page to zero it, and then calls SetPageUptodate(). Okay so let's say the following sequence of events happens: 1. Userspace calls fallocate(mode=0) to allocate some shmem pages. 2. Another thread, via a UFFD-registered mapping, manages to trigger a minor fault on one such page, while we still have !PageUptodate(). (I'm not 100% sure this can happen, but let's say it can.) 3. UFFD handler thread gets the minor fault event, and for whatever (buggy?) reason does nothing - it doesn't modify the page, it just calls CONTINUE. I think if we get to this point, zeroing the page, returning it, and setting up the PTEs seems somewhat reasonable to me. I suppose alternatively we could notice that this happened and return an error to the caller? I'm hesitant to mess with the behavior of shmem_getpage_gfp() to make such a thing happen though. I do think if we're going to set up the PTEs instead of returning an error, we definitely do need to clear and SetPageUptodate() the page first. In conclusion, I think this behavior is correct. > > -- > Peter Xu >