Large rbd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

What limits are there on the "reasonable size" of an rbd?

E.g. when I try to create a 1 PB rbd with default 4 MiB objects on my octopus cluster:

$ rbd create --size 1P --data-pool rbd.ec rbd.meta/fs
2021-01-20T18:19:35.799+1100 7f47a99253c0 -1 librbd::image::CreateRequest: validate_layout: image size not compatible with object

...which somes from:

== src/librbd/image/CreateRequest.cc
bool validate_layout(CephContext *cct, uint64_t size, file_layout_t &layout) {
  if (!librbd::ObjectMap<>::is_compatible(layout, size)) {
    lderr(cct) << "image size not compatible with object map" << dendl;
    return false;
  }

== src/librbd/ObjectMap.cc
template <typename I>
  bool ObjectMap<I>::is_compatible(const file_layout_t& layout, uint64_t size) {
    uint64_t object_count = Striper::get_num_objects(layout, size);
    return (object_count <= cls::rbd::MAX_OBJECT_MAP_OBJECT_COUNT);
  }

== src/cls/rbd/cls_rbd_types.h
static const uint32_t MAX_OBJECT_MAP_OBJECT_COUNT = 256000000;

For 4 MiB objects that object count equates to just over 976 TB.

Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or it just "this is crazy large, if you're trying to go over this you're doing something wrong, rethink your life..."?

Yes, I realise I can increase the size of the objects to get a larger rbd, or drop the object-map support (and the fast-diff that goes along with it).

I'm SO glad I found this limit now rather than starting on a smaller rbd and a finding the limit when I tried to grow the rbd underneath a rapidly filling filesystem.

What else should I know?

Background: I currently have nearly 0.5 PB on XFS (on lvm / raid6) and ZFS that I'm looking to move over to ceph. XFS is a requirement, for the reflinking (sadly not yet available in CephFS: https://tracker.ceph.com/issues/1680). The recommendation for XFS is to start larger, on a thin-provisioned store (hello rbd!), rather than start smaller and grow as needed - e.g. see the thread surrounding:

https://www.spinics.net/lists/linux-xfs/msg20099.html

Rather than a single large rbd, should I be looking at multiple smaller rbds linked together using lvm or somesuch? What are the tradeoffs?

And whilst we're here... for an rbd with the data on an erasure-coded pool, how do you calculate the amount of rbd metadata required if/when the rbd data is fully allocated?


Cheers,

Chris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux