Re: Large rbd

Chris Dunlop <chris@xxxxxxxxxxxx> · Fri, 22 Jan 2021 12:17:08 +1100

On Thu, Jan 21, 2021 at 07:52:00PM -0500, Jason Dillaman wrote:
On Thu, Jan 21, 2021 at 6:18 PM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
On Thu, Jan 21, 2021 at 10:57:49AM +0100, Robert Sander wrote:
Am 21.01.21 um 05:42 schrieb Chris Dunlop:
Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or
it just "this is crazy large, if you're trying to go over this you're
doing something wrong, rethink your life..."?

IMHO the limit is there because of the way deletion of RBDs work. "rbd
rm" has to look for every object, not only the ones that were really
created. This would make deleting a very very large RBD take a very very
long time.

I wouldn't have though the ceph designers would have put in a hard limit like
that just to protect people from a long time to delete.

You are free to disable the object-map when creating large images by
specifying the image-features -- or you can increase the object size
from its default 4MiB allocation size (which is honestly really no
different from QCOW2 switching increasing the backing cluster size as
the image grows larger).

The issue is that the size for the object-map for a 1PiB image w/ 4MiB
objects is going to be 268,435,456 backing objects which will require
64MiB of memory to store. It also just so happens that Ceph has a
hard-limit on the maximum object size of around 90MiB if I recall
correctly.

Is the whole object-map memory all held "in-core" the whole time, or is it 
retrieved / freed as needed (for a busy filesystem that might mean it's 
in-core practically the whole time, but for a quite filesystem maybe not 
so much).

In this case the server has 192G RAM so 64MiB doesn't sound so scary.

By the way - does that 64MiB also match approx how much fast/replicated 
storage I'll need if the data is on an ec volume?

I'm looking at 16M or larger objects, however I'm concerned about 
fragmentation and how that might affect the "thinness" of the volume. It 
seems with larger objects, and in the face of file removals in the upper 
filesystem, with many files less than the object size, there's far more 
chance for many objects to be significantly (but not completely) empty, 
blowing out the actual storage used compared to the logical storage used 
in the upper filesystem. Trimming won't help for partially allocated 
objects.

Actually, maybe that's a good reason to not create a humongous fs in the 
first place.

The XFS devs seem comfortable with growing a fs "a bit", e.g. 2-5 times 
original size, but seemingly at about 10 times it's getting a bit dodgy.

So, creating a smaller fs in the first place (with the expectation it may 
grow to 2-5 times original) means the fs itself will be encouraged to 
reuse the space in larger objects rather than spreading itself out and 
leaving a large number of partially filled objects.

I don't know if the amount written affects the rbd removal time.

When the object-map is enabled, only written data extents need to be
deleted. W/o the object-map, it would need to issue deletes against
all possible objects.

How does the lack of an object-map affect trimming? That's a very 
important factor for a large thin volume like this.

Thanks,

Chris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx