Hi Dave,
Thank you for your reply. I did some test today and described those as follows:
Delete the existing test file , and redo the test : fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test -name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
The iops resultes is 19k±(per second); I continue to fio this test file untill it was filled to the full. Then I did another test using the same test case, the results was 210k±(per second).(The results mentioned yesterday was partial. I used the same test file several times, the results degraded because of the test file was not fill to the full)
I try to remake the filesystem using the following command to increase the internal log size , inode size and agcount num:
mkfs.xfs /dev/hioa2 -f -n size=64k -i size=2048,align=1 -d agcount=2045 -l size=512m
but it has no help to the result.
Any suggestion to deal with this problems ?
I very appreciate your feedback.
songbo
2016-04-12 7:10 GMT+08:00 Dave Chinner <david@xxxxxxxxxxxxx>:
On Mon, Apr 11, 2016 at 10:14:06PM +0800, Songbo Wang wrote:
> Hi xfsers:
>
> I got some troubles on the performance of xfs.
> The environment is ,
> xfs version is 3.2.1,
> centos 7.1,
> kernel version:3.10.0-229.el7.x86_64.
> pcie-ssd card,
> mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d agcount=40 -l
> size=1024m.
> mount: mount /dev/hioa2 /mnt/ -t xfs -o
> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
> I use the following command to test iops: fio -ioengine=libaio -bs=4k
> -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test -name="EBS
> 4KB randwrite test" -iodepth=64 -runtime=60
> The results is normal at the beginning which is about 210k±,but some
> seconds later, the results down to 19k±.
Looks like the workload runs out of log space due to all the
allocation transactions being logged, which then causes new
transactions to start tail pushing the log to flush dirty metadata.
This is needed to to make more space in the log for on incoming dio
writes that require allocation transactions. This will block IO
submission until there is space available in the log.
Let's face it, all that test does is create a massively fragmented
50GB file, so you're going to have a lot of metadata to log. Do the
maths - if it runs at 200kiops for a few seconds, it's created a
million extents.
And it's doing random insert on the extent btree, so
it's repeatedly dirtying the entire extent btree. This will trigger
journal commits quite frequently as this is a large amount of
metadata that is being dirtied. e.g. at 500 extent records per 4k
block, a million extents will require 2000 leaf blocks to store them
all. That's 80MB of metadata per million extents that this workload
is generating and repeatedly dirtying.
Then there's also other metadata, like the free space btrees, that
is also being repeatedly dirtied, etc, so it would not be unexpected
to see a workload like this on high IOPS devices allocating 100MB of
metadata every few seconds and the amount being journalled steadily
increasing until the file is fully populated.
> I did a senond test ,
> umount the /dev/hioa2,
> fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite
> -filename=/dev/hioa2 -name="EBS 8KB randwrite test" -iodepth=64 -runtime=60
> The results was normal, the iops is about 210k± all the time.
That's not an equivalent test - it's being run direct to the block
device, not to a file on the filesytem on the block device, and so
you won't see artifacts taht are a result of creating worst case
file fragmentation....
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs