Re: rbd striping

Joe Buck <jbbuck@xxxxxxxxx> · Thu, 29 Aug 2013 09:12:14 -0700

The short answer is that if you use an approach like you suggest and 
then alter the cluster in any way (add a node or remove a node) then the 
ensuing re-balancing of data will move most of your data. CRUSH was 
designed to limit data movement in the case of cluster membership changes.

Here's a link to the CRUSH paper that goes into more detail.
http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf

Best,
-Joe Buck

On 08/29/2013 12:57 AM, Corin Langosch wrote:
Hi there,

I read about how striping of rbd works at 
http://ceph.com/docs/next/man/8/rbd/ and it seems rather complex to 
me. As the individual objects are placed randomly over all osds taking 
crush into account anyway, what's the benefit over simply calculating 
object_id = (position / chunk_size).to_i or even faster with object_id 
= position >> order?

I also wonder what object size is recommended for vm images? I assume 
the default of 4 MB is not optimal, something bigger like 64 MB would 
be much better as it'd require much fewer objects (less overhead on 
osds' filestores) and much fewer client-osds roundtrips (reads/ write 
from/ to different rados objects) for most vm workloads? The 
distribution should still be ok, as most vm images are several GB and 
so still have several hundrets or thousands of objects with 64MB 
objects? Are there any benchmarks available for this? :)

Cheers,
Corin

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com