Re: [PATCH v2] block : add larger order folio size instead of pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, May 04, 2024 at 02:35:15PM +0200, Hannes Reinecke wrote:
> > I think this is wandering into a minefield.  I'm pretty sure
> > it's considered valid to split the bio, and complete the two halves
> > independently.  Each one will put the refcounts for the pages it touches,
> > and if we do this early putting of references, that's going to fail.
> 
> Precisesly my worries. Something I want to talk to you about at LSF;
> refcounting of folios vs refcounting of pages.
> When one takes a refcount on a folio we are actually taking a refcount
> on the first page, which is okay if we stick with using the folio throughout
> the call chain. But if we start mixing between pages and folios (as we do
> here) we will be getting the refcount wrong.
> 
> Do you have plans how we could improve the situation?
> Like a warning 'Hey, you've used the folio for taking the reference, but now
> you are releasing the references for the page'?

This is a fairly common misunderstanding, but TLDR: problem solved long
before I started this project.

Individual pages don't actually have a refcount.  I know it looks
like they do, and they kind of do, but for tail pages, the refcount is
always 0.  Functions like get_page() and put_page() always operate on
the head page (ie folio) refcount.

Specifically, I think you're concerned about pages coming from GUP.
Take a look at try_get_folio().  We pass in a struct page, explicitly
get the refcount on a folio, check the page is still part of the
folio, then return the folio.  And we return the page to the caller
because the caller needs to know the precise page at that address,
not the folio that contains it.

There are functions which don't surreptitiously call compound_head()
behind your back.  set_page_count(), for example.  And page_ref_count()
(rather than the more normal page_count()).

And none of this is true if you don't use __GFP_COMP.  But let's call
that an aberration that must die.




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux