Re: Slow write speed on 3-node cluster with 6* SATA Harddisks (~ 3.5 MB/s)

Hermann Himmelbauer <hermann@xxxxxxx> · Wed, 6 Nov 2019 17:56:41 +0100

Dear Vitaliy, dear Paul,

Changing the block size for "dd" makes a huge difference.

However, still some things are not fully clear to me:

As recommended, I tried writing / reading directly to the rbd and this
is blazingly fast:

fio -ioengine=rbd -name=test -direct=1 -rw=read -bs=4M -iodepth=16
-pool=SATA -rbdname=vm-100-disk-0

write: IOPS=40, BW=160MiB/s (168MB/s)(4096MiB/25529msec)
read: IOPS=135, BW=542MiB/s (568MB/s)(4096MiB/7556msec)

When I do the same within the virtual machine, I get the following results:

fio --filename=/dev/vdb -name=test -direct=1 -rw=write -bs=4M -iodepth=16

-------- cache = writeback ----------
Blocksize: 4M:
read : io=4096.0MB, bw=97640KB/s, iops=23, runt= 42957msec
write: io=4096.0MB, bw=85250KB/s, iops=20, runt= 49200msec

Blocksize: 4k:
read : io=4096.0MB, bw=3988.6KB/s, iops=997, runt=1051599msec
write: io=4096.0MB, bw=14529KB/s, iops=3632, runt=288686msec
-------------------------------------

The speeds are much slower whereas I don't really know why.
Nevertheless, this seams also really reasonable, although it's strange
that there's no difference between "cache = unsafe" and "cache =
writeback". Moreover, I find it strange that reading with 4k-blocks is
that slow, while writing is still o.k.

My virtual machine is Debian 8, with a paravirtualized block device
(/dev/vdb), the process (and qemu parameters) look like the following:

root     1854681 18.3  8.0 5813032 1980804 ?     Sl   16:00  20:55
/usr/bin/kvm -id 100 -name bya-backend -chardev
socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon
chardev=qmp,mode=control -chardev
socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon
chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/100.pid
-daemonize -smbios type=1,uuid=e53d6e2d-708e-4511-bb26-b0f1aefd81c6 -smp
4,sockets=1,cores=4,maxcpus=4 -nodefaults -boot
menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg
-vnc unix:/var/run/qemu-server/100.vnc,password -cpu
kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 4096 -device
pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device
pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device
piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device
usb-tablet,id=tablet,bus=uhci.0,port=1 -device
VGA,id=vga,bus=pci.0,addr=0x2 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi
initiator-name=iqn.1993-08.org.debian:01:7c6dc4c7e9f -drive
if=none,id=drive-ide2,media=cdrom,aio=threads -device
ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -drive
file=/mnt/pve/pontos-images/images/100/vm-100-disk-2.raw,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on
-device
virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100
-drive
file=rbd:SATA/vm-100-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/SATA.keyring,if=none,id=drive-virtio1,cache=writeback,format=raw,aio=threads,detect-zeroes=on
-device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb
-netdev
type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown
-device
e1000,mac=46:22:36:C3:37:7E,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
-machine type=pc

In case you have some further speedup-hints for me, I'm glad.

Nevertheless thank you a lot for help!

Best Regards,
Hermann

Am 05.11.19 um 12:49 schrieb Виталий Филиппов:
> Yes, cache=unsafe has no effect with RBD. Hm, that's strange, you should
> get ~40*6 MB/s linear write with 6 HDDs and Bluestore.
> 
> Try to create a test image and test it with 'fio -ioengine=rbd
> -name=test -direct=1 -rw=write -bs=4M -iodepth=16 -pool=<pool>
> -rbdname=<rbd>' from outside a VM.
> 
> If you still get 4 MB/s, something's wrong with your ceph. If you get
> adequate performance, something's wrong with your VM settings.
> 
> 5 ноября 2019 г. 14:31:38 GMT+03:00, Hermann Himmelbauer
> <hermann@xxxxxxx> пишет:
> 
>     Hi,
>     Thank you for your quick reply, Proxmox offers me "writeback"
>     (cache=writeback) and "writeback unsafe" (cache=unsafe), however, for my
>     "dd" test, this makes no difference at all.
> 
>     I still have write speeds of ~ 4,5 MB/s.
> 
>     Perhaps "dd" disables the write cache?
> 
>     Would it perhaps help to put the journal or something else on a SSD?
> 
>     Best Regards,
>     Hermann
> 
>     Am 05.11.19 um 11:49 schrieb vitalif@xxxxxxxxxx:
> 
>         Use `cache=writeback` QEMU option for HDD clusters, that should
>         solve
>         your issue
> 
>             Hi,
>             I recently upgraded my 3-node cluster to proxmox 6 /
>             debian-10 and
>             recreated my ceph cluster with a new release (14.2.4
>             bluestore) -
>             basically hoping to gain some I/O speed.
> 
>             The installation went flawlessly, reading is faster than
>             before (~ 80
>             MB/s), however, the write speed is still really slow (~ 3,5
>             MB/s).
> 
>             I wonder if I can do anything to speed things up?
> 
>             My Hardware is as the following:
> 
>             3 Nodes with Supermicro X8DTT-HIBQF Mainboard each,
>             2 OSD per node (2TB SATA harddisks, WDC WD2000F9YZ-0),
>             interconnected via Infiniband 40
> 
>             The network should be reasonably fast, I measure ~ 16 GBit/s
>             with iperf,
>             so this seems fine.
> 
>             I use ceph for RBD only, so my measurement is simply doing a
>             very simple
>             "dd" read and write test within a virtual machine (Debian 8)
>             like the
>             following:
> 
>             read:
>             dd if=/dev/vdb | pv | dd of=/dev/null
>             -> 80 MB/s
> 
> 
>             write:
>             dd if=/dev/zero | pv | dd of=/dev/vdb
>             -> 3.5 MB/s
> 
>             When I do the same on the virtual machine on a disk that is
>             on a NFS
>             storage, I get something about 30 MB/s for reading and writing.
> 
>             If I disable the write cache on all OSD disks via "hdparm -W 0
>             /dev/sdX", I gain a little bit of performance, write speed
>             is then 4.3
>             MB/s.
> 
>             Thanks to your help from the list I plan to install a second
>             ceph
>             cluster which is SSD based (Samsung PM1725b) which should be
>             much
>             faster, however, I still wonder if there is any way to speed
>             up my
>             harddisk based cluster?
> 
>             Thank you in advance for any help,
> 
>             Best Regards,
>             Hermann
> 
> 
> -- 
> With best regards,
> Vitaliy Filippov

-- 
hermann@xxxxxxx
PGP/GPG: 299893C7 (on keyservers)
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx