XFS fragmentation on file append

Keyur Govande <keyurgovande@xxxxxxxxx> · Mon, 7 Apr 2014 18:53:46 -0400

Hello,

I'm currently investigating a MySQL performance degradation on XFS due
to file fragmentation.

The box has a 16 drive RAID 10 array with a 1GB battery backed cache
running on a 12 core box.

xfs_info shows:
meta-data=/dev/sda4    isize=256    agcount=24, agsize=24024992 blks
               =                 sectsz=512   attr=2, projid32bit=0
data         =                 bsize=4096   blocks=576599552, imaxpct=5
               =                 sunit=16     swidth=512 blks
naming   = version 2     bsize=4096   ascii-ci=0
log         = internal       bsize=4096   blocks=281552, version=2
             =                   sectsz=512   sunit=16 blks, lazy-count=1
realtime = none            extsz=4096   blocks=0, rtextents=0

The kernel version is: 3.14.0-1.el6.elrepo.x86_64 and the XFS
partition is mounted with: rw,noatime,allocsize=128m,inode64,swalloc.
The partition is 2TB in size and 40% full to simulate production.

Here's a test program that appends 512KB like MySQL does (write and
then fsync). To exacerbate the issue, it loops a bunch of times:
https://gist.github.com/keyurdg/961c19175b81c73fdaa3

When run, this creates ~9500 extents most of length 1024. cat'ing the
file to /dev/null after dropping the caches reads at an average of 75
MBps, way less than the hardware is capable of.

When I add a posix_fallocate before calling pwrite() as shown here
https://gist.github.com/keyurdg/eb504864d27ebfe7b40a the file
fragments an order of magnitude less (~30 extents), and cat'ing to
/dev/null proceeds at ~1GBps.

The same behavior is seen even when the allocsize option is removed
and the partition remounted.

This is somewhat unexpected and I'm working on a patch to add
fallocate to MySQL, wanted to check in here if I'm missing anything
obvious here.

Cheers,
Keyur.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html