On Thu, Jan 21, 2021 at 10:57:49AM +0100, Robert Sander wrote:
Hi,
Am 21.01.21 um 05:42 schrieb Chris Dunlop:
Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or
it just "this is crazy large, if you're trying to go over this you're
doing something wrong, rethink your life..."?
IMHO the limit is there because of the way deletion of RBDs work. "rbd
rm" has to look for every object, not only the ones that were really
created. This would make deleting a very very large RBD take a very very
long time.
I wouldn't have though the ceph designers would have put in a hard limit like
that just to protect people from a long time to delete.
The removal time may well be a consideration for some but it's not a
significant issue in this case as the filesystem is intended to last for years
(the XFS and ZFS it's meant to replace have been around for maybe a decade).
That said, it does take a while. For a 967T rbd (the largest possible w/
default 4M objects) with a small amount written to it (maybe 4T):
$ rbd info rbd.meta/fs
rbd image 'fs':
size 976 TiB in 255852544 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 8126791dce2ad3
data_pool: rbd.ec.data
block_name_prefix: rbd_data.22.8126791dce2ad3
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, data-pool
op_features:
flags:
create_timestamp: Thu Jan 21 14:03:38 2021
access_timestamp: Thu Jan 21 14:03:38 2021
modify_timestamp: Thu Jan 21 14:03:38 2021
$ time rbd remove rbd.meta/fs
real 117m31.183s
user 116m56.895s
sys 0m2.101s
The issue is the number of objects. For instance, the same size rbd (967T) but
created with "--object-size 16M":
$ rbd info rbd.meta/fs
rbd image 'fs':
size 976 TiB in 63963136 objects
order 24 (16 MiB objects)
...
$ time rbd remove rbd.meta/fs
real 7m23.326s
user 6m45.201s
sys 0m1.272s
I don't know if the amount written affects the rbd removal time.
Rather than a single large rbd, should I be looking at multiple smaller
rbds linked together using lvm or somesuch? What are the tradeoffs?
IMHO there are no tradeoffs, there could even be benefits creating a
volume group with multiple physical volumes on RBD as the requests can
be bettere parallelized (i.e. virtio-single SCSI controller for qemu).
That's a good point, I hadn't considered potential i/o bandwidth benefits.
Thanks,
Chris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx