Strange speed issues with XFS and very small writes

Arvydas Opulskis <zebediejus@xxxxxxxxx> · Fri, 14 Feb 2020 16:08:22 +0200

Hi, Cephers.

I would like to hear your ideas about strange situation we have in one of
our clusters.

It's Luminous 12.2.12 cluster. Recently we added 3 nodes with 10x SSD OSDs
to it and dedicated them to SSD pool for our OpenStack volumes. Initial
tests went well, IOPS were great, throughput was perfect - all good. Until
we got first real usage there. Very limited IOPS (~450), high disk
utilization (near 100%) and throughput (less than 1 MB/s) put us into the
tears.
After some investigation we found, that this situation only occurs when all
conditions are met:
1. Disk is RBD (test went fine from same server with local disks)
2. File system is XFS (no problems with ext4)
3. Block size is bigger than write
4. Only one FIO thread (numjobs) is used

When at least one of these conditions are not met - we get ~40k IOPS, great
throughput, etc. We did tests with fio, testing different values, but it's
quite clear: if write size is 4kb (same as block size) iops go up to 40k.
If write size is 3kb, then it limits to ~450 iops. From this point, it
doesn't matter how small the write is - it's always ~450 iops. After
changing block size to 2kb, situation is same - great speed until write is
less than 2kb in size. If we rise fio paramether "numjobs" to 10 we get
maximum possible iops: ~40k. Which is more than simple 10x increase.

Any ideas what is going on and why smaller writes take such a big impact on
performance in XFS, but no problems in EXT4?

Thank you for all the ideas!

Arvydas
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx