Re: [PATCH] writeback: plug writeback at a high level

Chris Mason <chris.mason@xxxxxxxxxxxx> · Tue, 18 Jun 2013 07:16:35 -0400

Quoting Dave Chinner (2013-06-17 21:58:06)
> On Mon, Jun 17, 2013 at 10:34:57AM -0400, Chris Mason wrote:
> > Quoting Dave Chinner (2013-06-14 22:50:50)
> > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > 
> > > Doing writeback on lots of little files causes terrible IOPS storms
> > > because of the per-mapping writeback plugging we do. This
> > > essentially causes imeediate dispatch of IO for each mapping,
> > > regardless of the context in which writeback is occurring.
> > > 
> > > IOWs, running a concurrent write-lots-of-small 4k files using fsmark
> > > on XFS results in a huge number of IOPS being issued for data
> > > writes.  Metadata writes are sorted and plugged at a high level by
> > > XFS, so aggregate nicely into large IOs. However, data writeback IOs
> > > are dispatched in individual 4k IOs, even when the blocks of two
> > > consecutively written files are adjacent.
> > > 
> > > Test VM: 8p, 8GB RAM, 4xSSD in RAID0, 100TB sparse XFS filesystem,
> > > metadata CRCs enabled.
> > > 
> > > Kernel: 3.10-rc5 + xfsdev + my 3.11 xfs queue (~70 patches)
> > 
> > I'm a little worried about this one, just because of the impact on ssds
> > from plugging in the aio code:
> 
> I'm testing on SSDs, but the impact has nothing to do with the
> underlying storage.
> 
> > https://lkml.org/lkml/2011/12/13/326
> 
> Oh, that's completely different situation - it's application
> submitted IO wherhe Io latency is a deterimining factor in
> performance.
> 
> This is for background writeback, where IO latency is not a primary
> performance issue - maximum throughput is what we are trying to
> acheive here. For writeback, well formed IO has a much greater
> impact on throughput than low latency IO submission, even for SSD
> based storage.

Very true, but at the same time we do wait for background writeback
sometimes.  It's worth a quick test...

> 
> > How exactly was your FS created?  I'll try it here.
> 
> The host has an XFS filesystem on a md RAID0 of 4x40Gb slices off
> larger SSDs:
> 
> $ cat /proc/mdstat
> Personalities : [raid0] 
> md2 : active raid0 sdb1[0] sde1[3] sdd1[2] sdc1[1]
>       167772032 blocks super 1.2 32k chunks
> 
> built with mkfs.xfs defaults. A sparse 100TB file is created and
> then fed to KVM with cache=none,virtio.
> 
> The guest formats the 100TB device using default mkfs.xfs parameters
> and uses default mount options, so it's a 100TB filesytsem with
> about 150GB of real storage in it...
> 
> The underlying hardware controller that the SSDs are attached to is
> limited to roughly 27,000 random 4k write IOPS, and that's the IO
> pattern that writeback is splattering at the device.

Ok, I'll try something less exotic here ;)

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html