On 08/29/2013 09:57 AM, Corin Langosch wrote:
Hi there, I read about how striping of rbd works at http://ceph.com/docs/next/man/8/rbd/ and it seems rather complex to me. As the individual objects are placed randomly over all osds taking crush into account anyway, what's the benefit over simply calculating object_id = (position / chunk_size).to_i or even faster with object_id = position >> order? I also wonder what object size is recommended for vm images? I assume the default of 4 MB is not optimal, something bigger like 64 MB would be much better as it'd require much fewer objects (less overhead on osds' filestores) and much fewer client-osds roundtrips (reads/ write from/ to different rados objects) for most vm workloads? The distribution should still be ok, as most vm images are several GB and so still have several hundrets or thousands of objects with 64MB objects? Are there any benchmarks available for this? :)
I am not aware of the reasoning behind the 4MB, but it seems like a "default" in a lot of situations. I can imagine other object sizes like you mention could work better in different situations. The amount of objects is not really the problem for Ceph, but the underlying filesystem on the OSD would require less fopen() calls when you have less objects.
I have never tested with a different object size then 4MB, but it could well be that something else works better.
Cheers, Corin _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com