RBD single process read performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've been working with a Ceph 0.56.4 setup and I've been seeing some RBD read performance issues with single processes / threads.

The setup is:
- 36 OSDs (2TB WD RE drives)
- 9 hosts (4 per OSD)
- 120GB Intel SSD as a journal per host
- 32GB Ram per host
- Quad Core Xeon CPU (E3-1220 V2 @ 3.10GHz)
- 2Gbit LACP link

The client (3.8.8 kernel) in this case is a single node connected with 20Gbit LACP to the same switches.

To sum it up, with "rados bench" I'm seeing about 918MB/sec read (LACP doesn't balance well with one client) and 400MB/sec write.

Note: 2 RADOS bench processes with 64 threads each.

While doing those RADOS benches the disks nor the SSDs are really busy, so it seems that can be tuned a bit further.

The problem is that when using either kernel RBD or librbd the read speeds are a lot slower then a write in a single process:

dd if=/dev/zero of=/dev/rbd1 bs=4M count=1024: 290MB/sec
dd if=/dev/rbd1 of=/dev/null bs=4M count=1024: 65MB/sec

When running multiple writers I max out at somewhere around 400MB/sec, the same as RADOS bench was telling me, but the reads go up to 300MB/sec when running multiple readers.

Running multiple dd instances will still achieve about 60MB/sec per dd, but it sums up to somewhere around 300MB/sec. (5 readers)

I changed the following settings:

osd op threads = 8
journal aio = true

The AIO journal showed a huge increase in write performance as expected, but increasing the op threads didn't change that much. Going from 2 (default) to 4 gave me about 5MB/sec and going to 8 added another 3MB/sec.

Since I'm hitting the same RBD image over and over I'd expected these blocks to be in the cache of that OSDs and have the read speeds reach near line performance.

The big difference seems to be in the amount of threads. I noticed the same with RADOS bench. With a smaller number of threads I wouldn't get to the 918MB/sec and I had to spawn multiple processes to get there.

However, 65MB/sec write per RBD device doesn't seem like a lot.

I also tried with librbd, but that gives a similar read performance as kernel RBD.

The end-goal is to run with librbd (OpenStack), but for now I just want to crank up the read performance of a single process.

I found multiple threads regarding the read performance, one showed that AMD systems where a problem with the hypertransport, but since these are Intel systems that isn't the case.

Any suggestions? I'm not trying to touch any kernel settings (yet) since the RADOS bench shows me a pretty high read performance.

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux