Have you tried blktrace to determine if there are differences in the IO patterns to the rbd-backed virtio-scsi block device (direct vs indirect through loop)? On Tue, Jun 27, 2017 at 3:17 PM, Ruben Rodriguez <ruben@xxxxxxx> wrote: > > We are setting a new set of servers to run the FSF/GNU infrastructure, > and we are seeing a strange behavior. From a Qemu host, reading small > files from a mounted rbd image is very slow. The "realworld" test that I > use is to copy the linux source code from the filesystem to /dev/shm. On > the host server that takes ~10 seconds to copy from a mapped rbd image, > but on the vm it takes over a minute. The same test also takes <20 > seconds when the vm storage is local LVM. Writing the files to the rbd > mounted disk also takes ~10 seconds. > > I suspect a problem with readahead and caching, so as a test I copied > those same files into a loop device inside the vm (stored in the same > rbd), reading takes ~10 seconds. I drop the caches before each test. > > This is how I run that test: > > dd if=/dev/zero of=test bs=1G count=5 > mkfs.xfs test > mount test /mnt > cp linux-src /mnt -a > echo 1 > /proc/sys/vm/drop_caches > time cp /mnt/linux-src /dev/shm -a > > I've tested many different parameters (readahead, partition alignment, > filesystem formatting, block queue settings, etc) with little change in > performance. Wrapping files in a loop device seems to change things in a > way that I cannot replicate on the upper layers otherwise. > > Is this expected or am I doing something wrong? > > Here are the specs: > Ceph 10.2.7 on Ubuntu xenial derivative. Kernel 4.4, Qemu 2.5 > 2 Ceph servers running 6x 1TB SSD OSDs each. > 2 Qemu/kvm servers managed with libvirt > All connected with 20GbE (bonding). Every server has 2x 16 core opteron > cpus, 2GB ram per OSD, and a bunch of ram on the KVM host servers. > > osd pool default size = 2 > osd pool default min size = 2 > osd pool default pg num = 512 > osd pool default pgp num = 512 > > lsblk -t > NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAM > sdb 0 512 0 512 512 0 noop 128 0 2G > loop0 0 512 0 512 512 0 128 0 0B > > Some numbers: > rados bench -p libvirt-pool 10 write: avg MB/s 339.508 avg lat 0.186789 > rados bench -p libvirt-pool 100 rand: avg MB/s 1111.42 avg lat 0.0534118 > Random small file read: > fio read 4k rand inside the vm: avg=2246KB/s 1708usec avg lat, 600IOPS > Sequential, small files read with readahead: > fio read 4k seq inside the vm: avg=308351KB/s 11usec avg lat, 55kIOPS > > The rbd images are attached with virtio-scsi (no difference using > virtio) and the guest block devices have 4M readahead set (no difference > if disabled). Rbd cache is enabled on server and client (no difference > if disabled). Forcing rbd readahead makes no difference. > > Please advice! > -- > Ruben Rodriguez | Senior Systems Administrator, Free Software Foundation > GPG Key: 05EF 1D2F FE61 747D 1FC8 27C3 7FAC 7D26 472F 4409 > https://fsf.org | https://gnu.org > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com