Re: [RFC 2/2] iomap: Support subpage size dirty tracking to improve write performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 02, 2022 at 02:03:11AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 31, 2022 at 03:43:24AM +0000, Matthew Wilcox wrote:
> > I agree that bufferheads do bottom-up dirty tracking, but I don't think
> > that what Ritesh is doing here is bottom-up dirty tracking.  Buffer
> > heads expose an API to dirty a block, which necessarily goes bottom-up.
> > There's no API here to dirty a block.  Instead there's an API to dirty
> > a range of a folio, so we're still top-down; we're just keeping track
> > of it in a more precise way.
> 
> Agreed.

Me three.  Er, too.

Unlike bufferheads which are scattered all over the kernel, the details
of dirty state tracking are now confined to buffered-io.c, and the
external functions remain the same.  From my POV that makes the dirty
state an implementation detail of iomap that callers don't have to care
about.

Just so long as nobody imports that nonfeature of the bufferhead code
where dirtying an already dirty bufferhead skips marking the folio dirty
and writepage failures also fail to redirty the page.  That bit us hard
recently and I strongly prefer not to repeat that.

> > If there is any dirty region, the folio must be marked dirty (otherwise
> > we'll never know that it needs to be written back).  The interesting
> > question (as your paragraph below hints) is whether removing the dirty
> > part of a folio from a file marks the folio clean.  I believe that's
> > optional, but it's probably worth doing.
> 
> Also agreed.
> 
> > > What happens with direct extent manipulation like fallocate()
> > > operations? These invalidate the parts of the page cache over the
> > > range we are punching, shifting, etc, without interacting directly
> > > with iomap, so do we now have to ensure that the sub-folio dirty
> > > regions are also invalidated correctly? i.e. do functions like
> > > xfs_flush_unmap_range() need to become iomap infrastructure so that
> > > they can update sub-folio dirty ranges correctly?
> > 
> > I'm slightly confused by this question.  As I understand the various
> > fallocate operations, they start by kicking out all the folios affected
> > by the operation (generally from the start of the operation to EOF),
> > so we'd writeback the (dirty part of) folios which are dirty, then
> > invalidate the folios in cache.  I'm not sure there's going to be
> > much difference.
> 
> Yes.  As far as I can tell all pagecache manipulation for the
> fallocate operations is driven by the file system and it is
> only done by those the punch/zero/move ranges.  The file system
> then goes though the normal pagecache truncate helpers rounded to
> the block size, which through the ops should do the right thing.

Yes, AFAICT.

> > Yes.  This is also going to be a performance problem.  Marking a folio as
> > dirty is no longer just setting the bit in struct folio and the xarray
> > but also setting all the bits in iop->state.  Depending on the size
> > of the folio, and the fs blocksize, this could be quite a lot of bits.
> > eg a 2MB folio with a 1k block size is 2048 bits (256 bytes, 6 cachelines
> > (it dirties the spinlock in cacheline 0, then the bitmap occupies 3 full
> > cachelines and 2 partial ones)).
> 
> We can always optimize by having a bit for the fairly common all dirty
> case and only track and look at the array if that is no the case.

Yes, it would help to make the ranges in the bit array better defined
than the semi-opencoded logic there is now.  (I'm whining specifically
about the test_bit calls sprinkled around).  Once that's done it
shouldn't be hard to add one more bit for the all-dirty state.  Though
I'd want to see the numbers to prove that it saves us time anywhere.

--D

> > filesystems right now.  Dave Howells' netfs infrastructure is trying
> > to solve the problem for everyone (and he's been looking at iomap as
> > inspiration for what he's doing).
> 
> Btw, I never understod why the network file systems don't just use
> iomap.  There is nothing block specific in the core iomap code.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux