Hi,
What limits are there on the "reasonable size" of an rbd?
E.g. when I try to create a 1 PB rbd with default 4 MiB objects on my
octopus cluster:
$ rbd create --size 1P --data-pool rbd.ec rbd.meta/fs
2021-01-20T18:19:35.799+1100 7f47a99253c0 -1 librbd::image::CreateRequest: validate_layout: image size not compatible with object
...which somes from:
== src/librbd/image/CreateRequest.cc
bool validate_layout(CephContext *cct, uint64_t size, file_layout_t &layout) {
if (!librbd::ObjectMap<>::is_compatible(layout, size)) {
lderr(cct) << "image size not compatible with object map" << dendl;
return false;
}
== src/librbd/ObjectMap.cc
template <typename I>
bool ObjectMap<I>::is_compatible(const file_layout_t& layout, uint64_t size) {
uint64_t object_count = Striper::get_num_objects(layout, size);
return (object_count <= cls::rbd::MAX_OBJECT_MAP_OBJECT_COUNT);
}
== src/cls/rbd/cls_rbd_types.h
static const uint32_t MAX_OBJECT_MAP_OBJECT_COUNT = 256000000;
For 4 MiB objects that object count equates to just over 976 TB.
Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or it
just "this is crazy large, if you're trying to go over this you're doing
something wrong, rethink your life..."?
Yes, I realise I can increase the size of the objects to get a larger rbd,
or drop the object-map support (and the fast-diff that goes along with
it).
I'm SO glad I found this limit now rather than starting on a smaller rbd
and a finding the limit when I tried to grow the rbd underneath a rapidly
filling filesystem.
What else should I know?
Background: I currently have nearly 0.5 PB on XFS (on lvm / raid6) and ZFS
that I'm looking to move over to ceph. XFS is a requirement, for the
reflinking (sadly not yet available in CephFS: https://tracker.ceph.com/issues/1680).
The recommendation for XFS is to start larger, on a thin-provisioned store
(hello rbd!), rather than start smaller and grow as needed - e.g. see the
thread surrounding:
https://www.spinics.net/lists/linux-xfs/msg20099.html
Rather than a single large rbd, should I be looking at multiple smaller
rbds linked together using lvm or somesuch? What are the tradeoffs?
And whilst we're here... for an rbd with the data on an erasure-coded
pool, how do you calculate the amount of rbd metadata required if/when the
rbd data is fully allocated?
Cheers,
Chris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx