write speed issue on RBD image

ganders at despegar.com (German Anders) · Wed, 02 Apr 2014 16:50:26 -0400

Did you try those DD statements with the oflag=direct ? like:

dd if=/dev/zero of=disk-test bs=1048576 count=512 oflag=direct; dd 
if=disk-test of=/dev/null bs=1048576 oflag=direct; /bin/rm disk-test

In that way you are bypassing the host cache and wait for the ACK to 
first go straight to the disk and make the write.

And see the performance numbers, if they change or not, and also the 
slow ones what are in any different. Also you could run those commands 
with an & at the last to run them on background and then immediately 
run a $ dstat --all to see how much data is send over the network 
in/out and how much data is write in disk locally.

Hope this help, also it would be great that you could share a little 
bit more about the Switches that you are using, firmware of the HBA on 
the hosts, are they Blades or "traditional" servers?, did you use any 
special options when formatting the XFS filesystem? and/or mount 
options? What hypervisor are you using?

Best regards,

German Anders
Field Storage Support Engineer
Despegar.com - IT Team

> --- Original message ---
> Asunto: write speed issue on RBD image
> De: Russell E. Glaue <rglaue at cait.org>
> Para: <ceph-users at lists.ceph.com>
> Fecha: Wednesday, 02/04/2014 15:12
>
> Can someone recommend some testing I can do to further investigate why 
> this issue with slow-disk-write in the VM OS is occurring?
> It seems the issue, details below, are perhaps related to the VM OS 
> running on the RADOS images in Ceph.
>
>
> Issue:
> I have a handful (like 10) of VM's running that, when tested, report 
> slow disk write speed of 8MB/s-30MB/s. All of the remaining VM's (like 
> 40) are reporting fast disk write speed of average 800MB/s-1.0GB/s. 
> There are no VMs reporting any disk write speeds in-between these 
> numbers. Restarting the OS on any of the VMs does not resolve the 
> issue.
>
> After these tests, I took one of the VMs (image02host) with slow disk 
> write speed and reinstalled the basic OS, including repartitioning the 
> disk. I used the same RADOS image. After this, I retested this VM 
> (image02host) and all the other VMs with slow disk write speed. This 
> VM (image02host) I reinstalled the OS on no longer has the slow disk 
> write speeds any longer. And, surprisingly, one of the other VMs 
> (another-host) with slow disk write speed started having fast write 
> speeds. All other VMs with slow disk write speed continued the same.
>
> So, I do not necessarily believe the slow disk issue is directly 
> related to any kind of bug or outstanding issue with Ceph/RADOS. I 
> only have a couple guesses at this point:
> 1. Perhaps my OS install (or possibly configuration), somehow is 
> having issue. I don't see how this is possible, however. For all the 
> VMs I have tested, they have all been kick-started with the same disk 
> and OS configuration. So they are virtually identical, but are having 
> either fast or slow disk write speed among them.
> 2. Perhaps I have some bad sectors or hard drive error at the hardware 
> level that is causing the issue. Perhaps the RADOS images of these 
> handful (like 10) of VMs is being written across a bad part of a hard 
> drive. This seems more likely to me. However, all drives across all 
> Ceph hosts are reporting good health.
>
> So, now, I have come to the ceph-user list to ask for help. What are 
> some things I can do to test if there is some, possibly, bad sector or 
> hardware error on one of the hard drives, or some issue with Ceph 
> writing to part of one of the hard drives? Or are there any other 
> tests I can run to help determine possible issues.
>
> And, secondly, if I wanted to move a RADOS image to new OSD blocks, is 
> there a way to do that without exporting and importing the image? 
> Perhaps, by resplattering the image and testing again to see if the 
> issue is resolved, this can help determine if the existing slow disk 
> write speed issue is how the image is splattered across OSDs - 
> indicating a bad OSD hard drive, or bad parts of an OSD hard drive.
>
>
> Ceph Configuration:
> * Ceph Version 0.72.2
> * Three Ceph hosts, CentOS 6.5 OS, using Xfs
> * All connected via 10GbE network
> * KVM/QEMU Virtualization, with Ceph support
> * Virtual Machines are all RHEL 5.9 32bit
> * Our Ceph setup is very basic. One pool for all VM disks, all drives 
> on all Ceph hosts are in that pool.
> * Ceph Caching is on:
>                rbd cache = true
>                rbd cache size = 128
>                rbd cache max dirty = 64
>                rbd cache target dirty = 64
>                rbd cache max dirty age = 10.0
>
>
> Test:
> Here I provide the test results of two VMs that are running on the 
> same Ceph host, using disk images from the same ceph pool, and were 
> cloned from the same RADOS snapshot. They both have the same exact KVM 
> configuration. However, they report dramaticly different write speeds. 
> When I tested them both, they were running on the same Ceph host. In 
> fact, for the VM reporting slow disk write speed, I even had it run on 
> a different Ceph host to test, and it still gave the same disk write 
> speed results.
>
> [root at linux]# rbd -p images info osimage01
> rbd image 'osimage01':
> size 28672 MB in 7168 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.2bfb74b0dc51
> format: 2
> features: layering
> [root at linux]# rbd -p images info osimage02
> rbd image 'osimage02':
> size 28672 MB in 7168 objects
> order 22 (4096 kB objects)
> block_name_prefix: rbd_data.2c1a2ae8944a
> format: 2
> features: layering
>
> None of the images used are cloned.
>
> [root at linux]# ssh image01host
> image01host [65]% dd if=/dev/zero of=disk-test bs=1048576 count=512; 
> dd if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.760446 seconds, 706 MB/s
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.214783 seconds, 2.5 GB/s
> image01host [66]% dd if=/dev/zero of=disk-test bs=1048576 count=512; 
> dd if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.514886 seconds, 1.0 GB/s
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.198433 seconds, 2.7 GB/s
> image01host [67]% dd if=/dev/zero of=disk-test bs=1048576 count=512; 
> dd if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.562401 seconds, 955 MB/s
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.223297 seconds, 2.4 GB/s
>
> [root at linux]# ssh image02host
> image02host [66]% dd if=/dev/zero of=disk-test bs=1048576 count=512; 
> dd if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 18.8284 seconds, 28.5 MB/s
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.158142 seconds, 3.4 GB/s
> image02host [67]% dd if=/dev/zero of=disk-test bs=1048576 count=512; 
> dd if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 29.1494 seconds, 18.4 MB/s
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.244414 seconds, 2.2 GB/s
> image02host [68]% dd if=/dev/zero of=disk-test bs=1048576 count=512; 
> dd if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 26.5817 seconds, 20.2 MB/s
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.17213 seconds, 3.1 GB/s
>
>
> ((After reinstalling the OS on VM image02host using RADOS image 
> osimage02))
> [root at image02host tmp]# dd if=/dev/zero of=disk-test bs=1048576 
> count=512; dd if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.453372 seconds, 1.2 GB/s
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.145874 seconds, 3.7 GB/s
> [root at image02host tmp]# dd if=/dev/zero of=disk-test bs=1048576 
> count=512; dd if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.591697 seconds, 907 MB/s
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.175544 seconds, 3.1 GB/s
> [root at image02host tmp]# dd if=/dev/zero of=disk-test bs=1048576 
> count=512; dd if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.599345 seconds, 896 MB/s
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.164405 seconds, 3.3 GB/s
>
> ((As mentioned, surprisingly, this other host started having fast disk 
> write speeds only after image02host was reinstalled. But I am not 
> understanding why this would be related.))
> another-host [65]% dd if=/dev/zero of=disk-test bs=1048576 count=512; 
> dd if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 7.88853 seconds, 68.1 MB/s
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.273677 seconds, 2.0 GB/s
> # image02host was reinstalled before the next command was issue #
> another-host [66]% dd if=/dev/zero of=disk-test bs=1048576 count=512; 
> dd if=disk-test of=/dev/null bs=1048576; /bin/rm disk-test
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.533444 seconds, 1.0 GB/s
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 0.198121 seconds, 2.7 GB/s
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140402/110816bf/attachment.htm>