Re: [PATCH v2] block : add larger order folio size instead of pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/4/24 18:32, Matthew Wilcox wrote:
On Sat, May 04, 2024 at 02:35:15PM +0200, Hannes Reinecke wrote:
I think this is wandering into a minefield.  I'm pretty sure
it's considered valid to split the bio, and complete the two halves
independently.  Each one will put the refcounts for the pages it touches,
and if we do this early putting of references, that's going to fail.

Precisesly my worries. Something I want to talk to you about at LSF;
refcounting of folios vs refcounting of pages.
When one takes a refcount on a folio we are actually taking a refcount
on the first page, which is okay if we stick with using the folio throughout
the call chain. But if we start mixing between pages and folios (as we do
here) we will be getting the refcount wrong.

Do you have plans how we could improve the situation?
Like a warning 'Hey, you've used the folio for taking the reference, but now
you are releasing the references for the page'?

This is a fairly common misunderstanding, but TLDR: problem solved long
before I started this project.

Individual pages don't actually have a refcount.  I know it looks
like they do, and they kind of do, but for tail pages, the refcount is
always 0.  Functions like get_page() and put_page() always operate on
the head page (ie folio) refcount.

Precisely.

Specifically, I think you're concerned about pages coming from GUP.
Take a look at try_get_folio().  We pass in a struct page, explicitly
get the refcount on a folio, check the page is still part of the
folio, then return the folio.  And we return the page to the caller
because the caller needs to know the precise page at that address,
not the folio that contains it.

There are functions which don't surreptitiously call compound_head()
behind your back.  set_page_count(), for example.  And page_ref_count()
(rather than the more normal page_count()).

And none of this is true if you don't use __GFP_COMP.  But let's call
that an aberration that must die.

Ah, right. So the refcount for a page is always unwound to use the refcount of the enclosing folio.

I was actually concerned with the iov_iter functions, where we take a reference for each page. Currently iov_iter is iterating in units of
PAGE_SIZE, so there is no easy way of converting that to folios.

But one step at a time, I guess. First get the blocksize > pagesize patches in.

Cheers,

Hannes
--
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@xxxxxxx                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux