We are setting a new set of servers to run the FSF/GNU infrastructure, and we are seeing a strange behavior. From a Qemu host, reading small files from a mounted rbd image is very slow. The "realworld" test that I use is to copy the linux source code from the filesystem to /dev/shm. On the host server that takes ~10 seconds to copy from a mapped rbd image, but on the vm it takes over a minute. The same test also takes <20 seconds when the vm storage is local LVM. Writing the files to the rbd mounted disk also takes ~10 seconds. I suspect a problem with readahead and caching, so as a test I copied those same files into a loop device inside the vm (stored in the same rbd), reading takes ~10 seconds. I drop the caches before each test. This is how I run that test: dd if=/dev/zero of=test bs=1G count=5 mkfs.xfs test mount test /mnt cp linux-src /mnt -a echo 1 > /proc/sys/vm/drop_caches time cp /mnt/linux-src /dev/shm -a I've tested many different parameters (readahead, partition alignment, filesystem formatting, block queue settings, etc) with little change in performance. Wrapping files in a loop device seems to change things in a way that I cannot replicate on the upper layers otherwise. Is this expected or am I doing something wrong? Here are the specs: Ceph 10.2.7 on Ubuntu xenial derivative. Kernel 4.4, Qemu 2.5 2 Ceph servers running 6x 1TB SSD OSDs each. 2 Qemu/kvm servers managed with libvirt All connected with 20GbE (bonding). Every server has 2x 16 core opteron cpus, 2GB ram per OSD, and a bunch of ram on the KVM host servers. osd pool default size = 2 osd pool default min size = 2 osd pool default pg num = 512 osd pool default pgp num = 512 lsblk -t NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAM sdb 0 512 0 512 512 0 noop 128 0 2G loop0 0 512 0 512 512 0 128 0 0B Some numbers: rados bench -p libvirt-pool 10 write: avg MB/s 339.508 avg lat 0.186789 rados bench -p libvirt-pool 100 rand: avg MB/s 1111.42 avg lat 0.0534118 Random small file read: fio read 4k rand inside the vm: avg=2246KB/s 1708usec avg lat, 600IOPS Sequential, small files read with readahead: fio read 4k seq inside the vm: avg=308351KB/s 11usec avg lat, 55kIOPS The rbd images are attached with virtio-scsi (no difference using virtio) and the guest block devices have 4M readahead set (no difference if disabled). Rbd cache is enabled on server and client (no difference if disabled). Forcing rbd readahead makes no difference. Please advice! -- Ruben Rodriguez | Senior Systems Administrator, Free Software Foundation GPG Key: 05EF 1D2F FE61 747D 1FC8 27C3 7FAC 7D26 472F 4409 https://fsf.org | https://gnu.org
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com