I am doing some testing on our new ceph cluster:
- 3 ceph nodes (8 cpu 128G, Ubuntu 12.04 + 3.13 kernel)
- 8 osd on each (i.e 24 in total)
- 4 compute nodes (ceph clients)
- 10G networking
- ceph 0.86 (97dcc0539dfa7dac3de74852305d51580b7b1f82)
I'm using one of the compute nodes to run some fio tests from, and
initial runs had us scratching our heads:
- 4M write 700 MB/s (175 IOPS)
- 4M read 320 MB/s (81 IOPS)
Why was reading so much slower than writing? The [client] section of the
compute nodes ceph.conf looks like:
rbd_cache = true
rbd_cache_size = 67108864
rbd_cache_max_dirty = 33554432
rbd_cache_writethrough_until_flush = true
After quite a lot of mucking about with different cache sizes (upto 2G),
we thought to switch the rbd cache off:
rbd_cache = false
...and rerun the read test:
- 4M read 1450 MB/s (362 IOPS)
That is more like it! So I am confused about what is happening here. I
put 'Giant' in the subject, but I'm wondering if our current (Emperor)
cluster is also suffering from this too (are seeing what we think is
poor read performance).
I redid the read test, running fio from one of the ceph nodes - to try
to mitigate network latency to some extent - same result. I'm *thinking*
that rbd cache management is not keeping up with the read stream.
FWIW I can't reproduce this with master (0.86-672-g5c051f5
5c051f5c0c640ddc9b27b7cab3860a899dc185cb) on a dev setup, but the
topology is very different (4 osds on 4 vms and all of 'em are actually
on one host with no real network), so may not be a relevant data point.
Regards
Mark
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com