Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)

pg_xf2@xxxxxxxxxxxxxxxxxx (Peter Grandi) · Fri, 6 Apr 2012 16:35:22 +0100

[ ... ]

> Yes, it does have 256 MB BBWC, and it is enabled. When I
> disabled it, the time needed would rise from 120 sec in the
> BBWC case to a whopping 330 sec.

> IIRC, I did the benchmark with barrier=0, but changing this did not
> make a big difference.

Note that the syntax is slightly different between 'ext4' and
XFS, and if you use 'barrier=0' with XFS, it will mount the
filetree *with* barriers (just double checked).

> Nothing did; that’s what frustrated me a bit ;). I also tried
> different Linux IO elevators, as you suggested in your other
> response, without any measurable effect.

Here a lot depends on the firmware of the P400, because with a
BBWC in theory it can completely ignore the request ordering and
the barriers it receives from the Linux side (and barriers
*might* be disabled), so by and large the Linux elevator should
not matter.

  Note: what would look good in this very narrow example is
  something like 'anticipatory', and while that has disappeared
  IIRC there is a way to tweak 'cfq' to behave like that.

But the times you report are consistent with the notion that
your Linux side seek graph is what happens at the P400 level
too, which is something that should not happen with a BBWC.

If that's the case, tweaking the Linux side scheduling might
help, for example increasing a lot 'queue/nr_requests' and
'device/queue_depth' ('nr_requests' should be apparently at
least twice 'queue_depth' in most cases).

Or else ensuring that the P400 does reorder requests and does
not write-through, as it has a BBWC.

Overall your test does not seem very notable to me, except that
it is strange that in the XFS 4 AG case the generated IO stream
(at the Linux level) is seeking incessantly between the 4 AGs
instead of in phases, and this apparently gets reflected to the
disks by the P400 even if it has a BBWC.

It is not clear to me why the seeks among the 4AGs happen in
such a tightly interleaved way (barriers? The way journaling
works?) instead of a more bulky way.

The suggestion by another commenter to use 'rotorstep' (probably
set to a high value) may help then, as it bunches files in AGs.

> The stripe size is this, btw.: su=16k,sw=4

BTW congratulations for limiting your RAID6 set to 4+2, and
using a relatively small chunk size compared to that chosen by
many others.

But it is still pretty large for this case: 64KiB when your
average file size is around 12KiB. Potentially lots of RMW, and
little opportunity to take advantage of the higher parallelism
of having 4 AGs with 4 independent streams of data.

As mentioned in another comment, I got nearly the same 'ext4'
writeout rate on a single disk...

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs