Re: [RFC 2/2] iomap: Support subpage size dirty tracking to improve write performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 31, 2022 at 06:08:53PM +1100, Dave Chinner wrote:
> On Mon, Oct 31, 2022 at 03:43:24AM +0000, Matthew Wilcox wrote:
> > On Sat, Oct 29, 2022 at 08:04:22AM +1100, Dave Chinner wrote:
> > > As it is, we already have the capability for the mapping tree to
> > > have multiple indexes pointing to the same folio - perhaps it's time
> > > to start thinking about using filesystem blocks as the mapping tree
> > > index rather than PAGE_SIZE chunks, so that the page cache can then
> > > track dirty state on filesystem block boundaries natively and
> > > this whole problem goes away. We have to solve this sub-folio dirty
> > > tracking problem for multi-page folios anyway, so it seems to me
> > > that we should solve the sub-page block size dirty tracking problem
> > > the same way....
> > 
> > That's an interesting proposal.  From the page cache's point of
> > view right now, there is only one dirty bit per folio, not per page.
> 
> Per folio, yes, but I thought we also had a dirty bit per index
> entry in the mapping tree. Writeback code uses the
> PAGECACHE_TAG_DIRTY mark to find the dirty folios efficiently (i.e.
> the write_cache_pages() iterator), so it's not like this is
> something new. i.e. we already have coherent, external dirty bit
> tracking mechanisms outside the folio itself that filesystems
> use.

That bit only exists (logically) for the canonical entry.  Physically
it exists for sibling entries, but it's not used; attempting to set
it on sibling entries will redirect to set it on the canonical entry.
That could be changed, but we elide entire layers of the tree once the
entry has a sufficiently high order.  So an order-6 folio occupies
a single slot one layer up; an order-7 folio occupies two slots, an
order-8 folio occupies four slots and so on.

My eventual goal is to ditch the radix tree and use the Maple Tree
(ie a B-tree), and that will always only have one slot per folio, no
matter what order it has.  Then there really only will be one bit per
folio.

> > We have a number of people looking at the analogous problem for network
> > filesystems right now.  Dave Howells' netfs infrastructure is trying
> > to solve the problem for everyone (and he's been looking at iomap as
> > inspiration for what he's doing).  I'm kind of hoping we end up with one
> > unified solution that can be used for all filesystems that want sub-folio
> > dirty tracking.  His solution is a bit more complex than I really want
> > to see, at least partially because he's trying to track dirtiness at
> > byte granularity, no matter how much pain that causes to the server.
> 
> Byte range granularity is probably overkill for block based
> filesystems - all we need is a couple of extra bits per block to be
> stored in the mapping tree alongside the folio....

I think it's overkill for network filesystems too.  By sending a
sector-misaligned write to the server, you force the server to do a R-M-W
before it commits the write to storage.  Assuming that the file has fallen
out of the server's cache, and a sufficiently busy server probably doesn't
have the memory capacity for the working set of all of its clients.

Anyway, Dave's plan for dirty tracking (as I understand the current
iteration) is to not store it linked from folio->private at all, but to
store it in a per-file tree of writes.  Then we wouldn't walk the page
cache looking for dirty folios, but walk the tree of writes choosing
which ones to write back and delete from the tree.  I don't know how
this will perform in practice, but it'll be generic enough to work for
any filesystem.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux