Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)

Stefan Ring <stefanrin@xxxxxxxxx> · Tue, 10 Apr 2012 08:11:24 +0200

> 150MB/s isn't correct.  Should be closer to 450MB/s.  This makes it
> appear that you're writing all these files to a single directory.  If
> you're writing them fairly evenly to 3 directories or a multiple of 3,
> you should see close to 450MB/s, if using mdraid linear over 3 P400
> RAID1 pairs.  If this is what you're doing then something seems wrong
> somewhere.  Try unpacking a kernel tarball.  Lots of subdirectories to
> exercise all 3 AGs thus all 3 spindles.

The spindles were exercised; I watched it with iostat. Maybe I could
have reached more with more parallelism, but that wasn’t my goal at
all. Although, over the course of these experiments, I got to doubt
that the controller could even handle this data rate.

>> simple copy of the tar onto the XFS file system yields the same linear
>> performance, the same as with ext4, btw. So 150 MB/sec seems to be the
>> best these disks can do, meaning that theoretically, with 3 AGs, it
>> should be able to reach 450 MB/sec under optimal conditions.
>
> The optimal condition, again, requires writing 3 of this file to 3
> directories to hit ~450MB/s, which you should get close to if using
> mdraid linear over RAID1 pairs.  XFS is a filesystem after all, so it's
> parallelism must come from manipulating usage of filesystem structures.
>  I thought I explained all of this previously when I introduced the "XFS
> concat" into this thread.

The optimal condition would be 3 parallel writes of huge files, which
can be easily written linearly. Not thousands of tiny files.

>> But then I guess I’m back to ext4 land. XFS just doesn’t offer enough
>> benefits in this case to justify the hassle.
>
> If you were writing to only one directory I can understand this
> sentiment.  Again, if you were writing 3 directories fairly evenly, with
> the md concat, then your sentiment here should be quite different.

Haha, I made a U-turn on this one. XFS is back on the table (and on
the disks now) ;). When I thought I was done, I wanted to restore a
few large KVM images which were on the disks prior to the RAID
reconfiguration. With ext4, I watched iostat writing at 130MB/s for a
while. After 2 or 3 minutes, it broke down completely and languished
at 30-40MB/s for many minutes, even after I had SIGSTOPed the writing
process, during which it was nearly impossible to use vim to edit a
file on the ext4 partition. It would pause for tens of seconds all the
time. It’s not even clear why it broke down so badly. From another
seekwatcher sample I took, it looked like fairly linear writing.

So I threw XFS back in, restarted the restore, and it went very
smoothly while still providing acceptable interactivity.

XFS is not a panacea (obviously), and it may be a bit slower in many
cases, and doesn’t seem to cope well with fragmented free space (which
is what this entire thread is really about), but overall it feels more
well-rounded. After all, I don’t really care how much it writes per
time unit, as long as it’s not ridiculously little and it doesn’t
bring everything else to a halt.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs