Re: about the xfs performance

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 12 Apr 2016 09:10:56 +1000

On Mon, Apr 11, 2016 at 10:14:06PM +0800, Songbo Wang wrote:
> Hi xfsers:
> 
> I got some troubles on the performance of  xfs.
> The environment is ,
>      xfs version is 3.2.1,
>      centos 7.1,
>      kernel version:3.10.0-229.el7.x86_64.
>      pcie-ssd card,
>      mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d agcount=40 -l
> size=1024m.
>      mount: mount /dev/hioa2 /mnt/  -t xfs -o
> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
> I use the following command to test iops: fio -ioengine=libaio -bs=4k
> -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test -name="EBS
> 4KB randwrite test" -iodepth=64 -runtime=60
> The results is normal at the beginning which is about 210k±，but some
> seconds later, the results down to 19k±.

Looks like the workload runs out of log space due to all the
allocation transactions being logged, which then causes new
transactions to start tail pushing the log to flush dirty metadata.
This is needed to to make more space in the log for on incoming dio
writes that require allocation transactions. This will block IO
submission until there is space available in the log.

Let's face it, all that test does is create a massively fragmented
50GB file, so you're going to have a lot of metadata to log. Do the
maths - if it runs at 200kiops for a few seconds, it's created a
million extents.

And it's doing random insert on the extent btree, so
it's repeatedly dirtying the entire extent btree. This will trigger
journal commits quite frequently as this is a large amount of
metadata that is being dirtied. e.g. at 500 extent records per 4k
block, a million extents will require 2000 leaf blocks to store them
all. That's 80MB of metadata per million extents that this workload
is generating and repeatedly dirtying.

Then there's also other metadata, like the free space btrees, that
is also being repeatedly dirtied, etc, so it would not be unexpected
to see a workload like this on high IOPS devices allocating 100MB of
metadata every few seconds and the amount being journalled steadily
increasing until the file is fully populated.

> I did a senond test ,
>      umount the /dev/hioa2,
>      fio -ioengine=libaio -bs=4k -direct=1  -thread -rw=randwrite
> -filename=/dev/hioa2  -name="EBS 8KB randwrite test" -iodepth=64 -runtime=60
> The results was normal, the iops is about 210k± all the time.

That's not an equivalent test - it's being run direct to the block
device, not to a file on the filesytem on the block device, and so
you won't see artifacts taht are a result of creating worst case
file fragmentation....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs