On 3/19/19 7:06 AM, Kirill A. Shutemov wrote: > On Tue, Mar 19, 2019 at 09:47:24AM -0400, Jerome Glisse wrote: >> On Tue, Mar 19, 2019 at 03:04:17PM +0300, Kirill A. Shutemov wrote: >>> On Fri, Mar 08, 2019 at 01:36:33PM -0800, john.hubbard@xxxxxxxxx wrote: >>>> From: John Hubbard <jhubbard@xxxxxxxxxx> >> >> [...] >> >>>> diff --git a/mm/gup.c b/mm/gup.c >>>> index f84e22685aaa..37085b8163b1 100644 >>>> --- a/mm/gup.c >>>> +++ b/mm/gup.c >>>> @@ -28,6 +28,88 @@ struct follow_page_context { >>>> unsigned int page_mask; >>>> }; >>>> >>>> +typedef int (*set_dirty_func_t)(struct page *page); >>>> + >>>> +static void __put_user_pages_dirty(struct page **pages, >>>> + unsigned long npages, >>>> + set_dirty_func_t sdf) >>>> +{ >>>> + unsigned long index; >>>> + >>>> + for (index = 0; index < npages; index++) { >>>> + struct page *page = compound_head(pages[index]); >>>> + >>>> + if (!PageDirty(page)) >>>> + sdf(page); >>> >>> How is this safe? What prevents the page to be cleared under you? >>> >>> If it's safe to race clear_page_dirty*() it has to be stated explicitly >>> with a reason why. It's not very clear to me as it is. >> >> The PageDirty() optimization above is fine to race with clear the >> page flag as it means it is racing after a page_mkclean() and the >> GUP user is done with the page so page is about to be write back >> ie if (!PageDirty(page)) see the page as dirty and skip the sdf() >> call while a split second after TestClearPageDirty() happens then >> it means the racing clear is about to write back the page so all >> is fine (the page was dirty and it is being clear for write back). >> >> If it does call the sdf() while racing with write back then we >> just redirtied the page just like clear_page_dirty_for_io() would >> do if page_mkclean() failed so nothing harmful will come of that >> neither. Page stays dirty despite write back it just means that >> the page might be write back twice in a row. > > Fair enough. Should we get it into a comment here? How's this read to you? I reworded and slightly expanded Jerome's description: diff --git a/mm/gup.c b/mm/gup.c index d1df7b8ba973..86397ae23922 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -61,6 +61,24 @@ static void __put_user_pages_dirty(struct page **pages, for (index = 0; index < npages; index++) { struct page *page = compound_head(pages[index]); + /* + * Checking PageDirty at this point may race with + * clear_page_dirty_for_io(), but that's OK. Two key cases: + * + * 1) This code sees the page as already dirty, so it skips + * the call to sdf(). That could happen because + * clear_page_dirty_for_io() called page_mkclean(), + * followed by set_page_dirty(). However, now the page is + * going to get written back, which meets the original + * intention of setting it dirty, so all is well: + * clear_page_dirty_for_io() goes on to call + * TestClearPageDirty(), and write the page back. + * + * 2) This code sees the page as clean, so it calls sdf(). + * The page stays dirty, despite being written back, so it + * gets written back again in the next writeback cycle. + * This is harmless. + */ if (!PageDirty(page)) sdf(page); > >>>> +void put_user_pages(struct page **pages, unsigned long npages) >>>> +{ >>>> + unsigned long index; >>>> + >>>> + for (index = 0; index < npages; index++) >>>> + put_user_page(pages[index]); >>> >>> I believe there's an room for improvement for compound pages. >>> >>> If there's multiple consequential pages in the array that belong to the >>> same compound page we can get away with a single atomic operation to >>> handle them all. >> >> Yes maybe just add a comment with that for now and leave this kind of >> optimization to latter ? > > Sounds good to me. > Here's a comment for that: @@ -127,6 +145,11 @@ void put_user_pages(struct page **pages, unsigned long npages) { unsigned long index; + /* + * TODO: this can be optimized for huge pages: if a series of pages is + * physically contiguous and part of the same compound page, then a + * single operation to the head page should suffice. + */ for (index = 0; index < npages; index++) put_user_page(pages[index]); } thanks, -- John Hubbard NVIDIA