Hi, Jann, On Mon, Sep 21, 2020 at 11:55:06PM +0200, Jann Horn wrote: > On Mon, Sep 21, 2020 at 11:20 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > This patch is greatly inspired by the discussions on the list from Linus, Jason > > Gunthorpe and others [1]. > > > > It allows copy_pte_range() to do early cow if the pages were pinned on the > > source mm. Currently we don't have an accurate way to know whether a page is > > pinned or not. The only thing we have is page_maybe_dma_pinned(). However > > that's good enough for now. Especially, with the newly added mm->has_pinned > > flag to make sure we won't affect processes that never pinned any pages. > > To clarify: This patch only handles pin_user_pages() callers and > doesn't try to address other GUP users, right? E.g. if task A uses > process_vm_write() on task B while task B is going through fork(), > that can still race in such a way that the written data only shows up > in the child and not in B, right? I saw that process_vm_write() is using pin_user_pages_remote(), so I think after this patch applied the data will only be written to B but not the child. Because when B fork() with these temp pinned pages, it will copy the pages rather than write-protect them any more. IIUC the child could still have partial data, but at last (after unpinned) B should always have the complete data set. > > I dislike the whole pin_user_pages() concept because (as far as I > understand) it fundamentally tries to fix a problem in the subset of > cases that are more likely to occur in practice (long-term pins > overlapping with things like writeback), and ignores the rarer cases > ("short-term" GUP). John/Jason or others may be better on commenting on this one. From my own understanding, I thought it was the right thing to do so that we'll always guarantee process B gets the whole data. From that pov this patch should make sense even for short term gups. But maybe I've missed something. -- Peter Xu