Re: [PATCH] writeback: plug writeback at a high level

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 18 Jun 2013 11:58:06 +1000

On Mon, Jun 17, 2013 at 10:34:57AM -0400, Chris Mason wrote:
> Quoting Dave Chinner (2013-06-14 22:50:50)
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > Doing writeback on lots of little files causes terrible IOPS storms
> > because of the per-mapping writeback plugging we do. This
> > essentially causes imeediate dispatch of IO for each mapping,
> > regardless of the context in which writeback is occurring.
> > 
> > IOWs, running a concurrent write-lots-of-small 4k files using fsmark
> > on XFS results in a huge number of IOPS being issued for data
> > writes.  Metadata writes are sorted and plugged at a high level by
> > XFS, so aggregate nicely into large IOs. However, data writeback IOs
> > are dispatched in individual 4k IOs, even when the blocks of two
> > consecutively written files are adjacent.
> > 
> > Test VM: 8p, 8GB RAM, 4xSSD in RAID0, 100TB sparse XFS filesystem,
> > metadata CRCs enabled.
> > 
> > Kernel: 3.10-rc5 + xfsdev + my 3.11 xfs queue (~70 patches)
> 
> I'm a little worried about this one, just because of the impact on ssds
> from plugging in the aio code:

I'm testing on SSDs, but the impact has nothing to do with the
underlying storage.

> https://lkml.org/lkml/2011/12/13/326

Oh, that's completely different situation - it's application
submitted IO wherhe Io latency is a deterimining factor in
performance.

This is for background writeback, where IO latency is not a primary
performance issue - maximum throughput is what we are trying to
acheive here. For writeback, well formed IO has a much greater
impact on throughput than low latency IO submission, even for SSD
based storage.

> How exactly was your FS created?  I'll try it here.

The host has an XFS filesystem on a md RAID0 of 4x40Gb slices off
larger SSDs:

$ cat /proc/mdstat
Personalities : [raid0] 
md2 : active raid0 sdb1[0] sde1[3] sdd1[2] sdc1[1]
      167772032 blocks super 1.2 32k chunks

built with mkfs.xfs defaults. A sparse 100TB file is created and
then fed to KVM with cache=none,virtio.

The guest formats the 100TB device using default mkfs.xfs parameters
and uses default mount options, so it's a 100TB filesytsem with
about 150GB of real storage in it...

The underlying hardware controller that the SSDs are attached to is
limited to roughly 27,000 random 4k write IOPS, and that's the IO
pattern that writeback is splattering at the device.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html