The short answer is that if you use an approach like you suggest and
then alter the cluster in any way (add a node or remove a node) then the
ensuing re-balancing of data will move most of your data. CRUSH was
designed to limit data movement in the case of cluster membership changes.
Here's a link to the CRUSH paper that goes into more detail.
http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
Best,
-Joe Buck
On 08/29/2013 12:57 AM, Corin Langosch wrote:
Hi there,
I read about how striping of rbd works at
http://ceph.com/docs/next/man/8/rbd/ and it seems rather complex to
me. As the individual objects are placed randomly over all osds taking
crush into account anyway, what's the benefit over simply calculating
object_id = (position / chunk_size).to_i or even faster with object_id
= position >> order?
I also wonder what object size is recommended for vm images? I assume
the default of 4 MB is not optimal, something bigger like 64 MB would
be much better as it'd require much fewer objects (less overhead on
osds' filestores) and much fewer client-osds roundtrips (reads/ write
from/ to different rados objects) for most vm workloads? The
distribution should still be ok, as most vm images are several GB and
so still have several hundrets or thousands of objects with 64MB
objects? Are there any benchmarks available for this? :)
Cheers,
Corin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com