Performance issue with small files, and weird "workaround"

Ruben Rodriguez <ruben@xxxxxxx> · Tue, 27 Jun 2017 15:17:47 -0400

We are setting a new set of servers to run the FSF/GNU infrastructure,
and we are seeing a strange behavior. From a Qemu host, reading small
files from a mounted rbd image is very slow. The "realworld" test that I
use is to copy the linux source code from the filesystem to /dev/shm. On
the host server that takes ~10 seconds to copy from a mapped rbd image,
but on the vm it takes over a minute. The same test also takes <20
seconds when the vm storage is local LVM. Writing the files to the rbd
mounted disk also takes ~10 seconds.

I suspect a problem with readahead and caching, so as a test I copied
those same files into a loop device inside the vm (stored in the same
rbd), reading takes ~10 seconds. I drop the caches before each test.

This is how I run that test:

dd if=/dev/zero of=test bs=1G count=5
mkfs.xfs test
mount test /mnt
cp linux-src /mnt -a
echo 1 > /proc/sys/vm/drop_caches
time cp /mnt/linux-src /dev/shm -a

I've tested many different parameters (readahead, partition alignment,
filesystem formatting, block queue settings, etc) with little change in
performance. Wrapping files in a loop device seems to change things in a
way that I cannot replicate on the upper layers otherwise.

Is this expected or am I doing something wrong?

Here are the specs:
Ceph 10.2.7 on Ubuntu xenial derivative. Kernel 4.4, Qemu 2.5
2 Ceph servers running 6x 1TB SSD OSDs each.
2 Qemu/kvm servers managed with libvirt
All connected with 20GbE (bonding). Every server has 2x 16 core opteron
cpus, 2GB ram per OSD, and a bunch of ram on the KVM host servers.

osd pool default size = 2
osd pool default min size = 2
osd pool default pg num = 512
osd pool default pgp num = 512

lsblk -t
NAME ALIGNMENT  MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAM
sdb         0     512      0     512     512    0 noop      128  0    2G
loop0       0     512      0     512     512    0           128  0    0B

Some numbers:
rados bench -p libvirt-pool 10 write: avg MB/s 339.508 avg lat 0.186789
rados bench -p libvirt-pool 100 rand: avg MB/s 1111.42 avg lat 0.0534118
Random small file read:
fio read 4k rand inside the vm: avg=2246KB/s 1708usec avg lat, 600IOPS
Sequential, small files read with readahead:
fio read 4k  seq inside the vm: avg=308351KB/s 11usec avg lat, 55kIOPS

The rbd images are attached with virtio-scsi (no difference using
virtio) and the guest block devices have 4M readahead set (no difference
if disabled). Rbd cache is enabled on server and client (no difference
if disabled). Forcing rbd readahead makes no difference.

Please advice!
-- 
Ruben Rodriguez | Senior Systems Administrator, Free Software Foundation
GPG Key: 05EF 1D2F FE61 747D 1FC8  27C3 7FAC 7D26 472F 4409
https://fsf.org | https://gnu.org

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com