Re: Performance issue with small files, and weird "workaround"

Jason Dillaman <jdillama@xxxxxxxxxx> · Tue, 27 Jun 2017 19:08:25 -0400

Have you tried blktrace to determine if there are differences in the
IO patterns to the rbd-backed virtio-scsi block device (direct vs
indirect through loop)?

On Tue, Jun 27, 2017 at 3:17 PM, Ruben Rodriguez <ruben@xxxxxxx> wrote:
>
> We are setting a new set of servers to run the FSF/GNU infrastructure,
> and we are seeing a strange behavior. From a Qemu host, reading small
> files from a mounted rbd image is very slow. The "realworld" test that I
> use is to copy the linux source code from the filesystem to /dev/shm. On
> the host server that takes ~10 seconds to copy from a mapped rbd image,
> but on the vm it takes over a minute. The same test also takes <20
> seconds when the vm storage is local LVM. Writing the files to the rbd
> mounted disk also takes ~10 seconds.
>
> I suspect a problem with readahead and caching, so as a test I copied
> those same files into a loop device inside the vm (stored in the same
> rbd), reading takes ~10 seconds. I drop the caches before each test.
>
> This is how I run that test:
>
> dd if=/dev/zero of=test bs=1G count=5
> mkfs.xfs test
> mount test /mnt
> cp linux-src /mnt -a
> echo 1 > /proc/sys/vm/drop_caches
> time cp /mnt/linux-src /dev/shm -a
>
> I've tested many different parameters (readahead, partition alignment,
> filesystem formatting, block queue settings, etc) with little change in
> performance. Wrapping files in a loop device seems to change things in a
> way that I cannot replicate on the upper layers otherwise.
>
> Is this expected or am I doing something wrong?
>
> Here are the specs:
> Ceph 10.2.7 on Ubuntu xenial derivative. Kernel 4.4, Qemu 2.5
> 2 Ceph servers running 6x 1TB SSD OSDs each.
> 2 Qemu/kvm servers managed with libvirt
> All connected with 20GbE (bonding). Every server has 2x 16 core opteron
> cpus, 2GB ram per OSD, and a bunch of ram on the KVM host servers.
>
> osd pool default size = 2
> osd pool default min size = 2
> osd pool default pg num = 512
> osd pool default pgp num = 512
>
> lsblk -t
> NAME ALIGNMENT  MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAM
> sdb         0     512      0     512     512    0 noop      128  0    2G
> loop0       0     512      0     512     512    0           128  0    0B
>
> Some numbers:
> rados bench -p libvirt-pool 10 write: avg MB/s 339.508 avg lat 0.186789
> rados bench -p libvirt-pool 100 rand: avg MB/s 1111.42 avg lat 0.0534118
> Random small file read:
> fio read 4k rand inside the vm: avg=2246KB/s 1708usec avg lat, 600IOPS
> Sequential, small files read with readahead:
> fio read 4k  seq inside the vm: avg=308351KB/s 11usec avg lat, 55kIOPS
>
> The rbd images are attached with virtio-scsi (no difference using
> virtio) and the guest block devices have 4M readahead set (no difference
> if disabled). Rbd cache is enabled on server and client (no difference
> if disabled). Forcing rbd readahead makes no difference.
>
> Please advice!
> --
> Ruben Rodriguez | Senior Systems Administrator, Free Software Foundation
> GPG Key: 05EF 1D2F FE61 747D 1FC8  27C3 7FAC 7D26 472F 4409
> https://fsf.org | https://gnu.org
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com