On 3/16/21 8:43 PM, Linus Torvalds wrote:
[ Adding btrfs people explicitly, maybe they see this on the fs-devel
list, but maybe they don't react .. ]
On Tue, Mar 16, 2021 at 12:07 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
This isn't a problem with this patch per se, but I'm concerned about
private2 and expected page refcounts.
Ugh. You are very right.
It would be good to just change the rules - I get the feeling nobody
actually depended on them anyway because they were _so_ esoteric.
static inline int is_page_cache_freeable(struct page *page)
{
/*
* A freeable page cache page is referenced only by the caller
* that isolated the page, the page cache and optional buffer
* heads at page->private.
*/
int page_cache_pins = thp_nr_pages(page);
return page_count(page) - page_has_private(page) == 1 + page_cache_pins;
You're right, that "page_has_private()" is really really nasty.
The comment is, I think, the traditional usage case, which used to be
about page->buffers. Obviously these days it is now about
page->private with PG_private set, pointing to buffers
(attach_page_private() and detach_page_private()).
But as you point out:
#define PAGE_FLAGS_PRIVATE \
(1UL << PG_private | 1UL << PG_private_2)
So ... a page with both flags cleared should have a refcount of N.
A page with one or both flags set should have a refcount of N+1.
Could we just remove the PG_private_2 thing in this context entirely,
and make the rule be that
(a) PG_private means that you have some local private data in
page->private, and that's all that matters for the "freeable" thing.
(b) PG_private_2 does *not* have the same meaning, and has no bearing
on freeability (and only the refcount matters)
I _)think_ the btrfs behavior is to only use PagePrivate2() when it
has a reference to the page, so btrfs doesn't care?
I think fscache is already happy to take the page count when using
PG_private_2 for locking, exactly because I didn't want to have any
confusion about lifetimes. But this "page_has_private()" math ends up
meaning it's confusing anyway.
btrfs people? What are the semantics for PG_private_2? Is it just a
flag, and you really don't want it to have anything to do with any
page lifetime decisions? Or?
Yeah it's just a flag, we use it to tell that the page is part of a range that
has been allocated for IO. The lifetime of the page is independent of the page,
but is generally either dirty or under writeback, so either it goes through
truncate and we clear PagePrivate2 there, or it actually goes through IO and is
cleared before we drop the page in our endio. We _always_ have PG_private set
on the page as long as we own it, and PG_private_2 is only set in this IO
related context, so we're safe there because of the rules around
PG_dirty/PG_writeback. We don't need it to have an extra ref for it being set.
Thanks,
Josef