On 05.05.20 16:01, Christian Borntraeger wrote: > > > On 05.05.20 15:55, Ulrich Weigand wrote: >> On Tue, May 05, 2020 at 05:34:45AM -0700, Dave Hansen wrote: >>> On 5/4/20 6:41 AM, Ulrich Weigand wrote: >>>> You're right that there is no mechanism to prevent new references, >>>> but that's really never been the goal either. We're simply trying >>>> to ensure that no I/O is ever done on a page that is in the "secure" >>>> (or inaccessible) state. To do so, we rely on the assumption that >>>> all code that starts I/O on a page cache page will *first*: >>>> - mark the page as pending I/O by either taking an extra page >>>> count, or by setting the Writeback flag; then: >>>> - call arch_make_page_accessible(); then: >>>> - start I/O; and only after I/O has finished: >>>> - remove the "pending I/O" marker (Writeback and/or extra ref) >>> >>> Let's ignore writeback for a moment because get_page() is the more >>> general case. The locking sequence is: >>> >>> 1. get_page() (or equivalent) "locks out" a page from converting to >>> inaccessbile, >>> 2. followed by a make_page_accessible() guarantees that the page >>> *stays* accessible until >>> 3. I/O is safe in this region >>> 4. put_page(), removes the "lock out", I/O now unsafe >> >> Yes, exactly. >> >>> They key is, though, the get_page() must happen before >>> make_page_accessible() and *every* place that acquires a new reference >>> needs a make_page_accessible(). >> >> Well, sort of: every place that acquires a new reference *and then >> performs I/O* needs a make_page_accessible(). There seem to be a >> lot of plain get_page() calls that aren't related to I/O. >> >>> try_get_page() is obviously one of those "new reference sites" and it >>> only has one call site outisde of the gup code: generic_pipe_buf_get(), >>> which is effectively patched by the patch that started this thread. The >>> fact that this one oddball site _and_ gup are patched now is a good sign. >>> >>> *But*, I still don't know how that could work nicely: >>> >>>> static inline __must_check bool try_get_page(struct page *page) >>>> { >>>> page = compound_head(page); >>>> if (WARN_ON_ONCE(page_ref_count(page) <= 0)) >>>> return false; >>>> page_ref_inc(page); >>>> return true; >>>> } >>> >>> If try_get_page() collides with a freeze_page_refs(), it'll hit the >>> WARN_ON_ONCE(), which is surely there for a good reason. I'm not sure >>> that warning is _actually_ valid since freeze_page_refs() isn't truly a >>> 0 refcount. But, the fact that this hasn't been encountered means that >>> the testing here is potentially lacking. >> >> This is indeed interesting. In particular if you compare try_get_page >> with try_get_compound_head in gup.c, which does instead: >> >> if (WARN_ON_ONCE(page_ref_count(head) < 0)) >> return NULL; >> >> which seems more reasonable to me, given the presence of the >> page_ref_freeze method. So I'm not sure why try_get_page has <= 0. > > > Just looked at > commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 mm: add 'try_get_page()' helper function > > which says: > Also like 'get_page()', you can't use this function unless you already > had a reference to the page. The intent is that you can use this > exactly like get_page(), but in situations where you want to limit the > maximum reference count. > > The code currently does an unconditional WARN_ON_ONCE() if we ever hit > the reference count issues (either zero or negative), as a notification > that the conditional non-increment actually happened. > > If try_get_page must not be called with an existing reference, that means s/not// > that when we call it the page reference is already higher and our freeze > will never succeed. That would imply that we cannot trigger this. No? >