Re: Large rbd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 21, 2021 at 07:52:00PM -0500, Jason Dillaman wrote:
On Thu, Jan 21, 2021 at 6:18 PM Chris Dunlop <chris@xxxxxxxxxxxx> wrote:
On Thu, Jan 21, 2021 at 10:57:49AM +0100, Robert Sander wrote:
Am 21.01.21 um 05:42 schrieb Chris Dunlop:
Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or
it just "this is crazy large, if you're trying to go over this you're
doing something wrong, rethink your life..."?

IMHO the limit is there because of the way deletion of RBDs work. "rbd
rm" has to look for every object, not only the ones that were really
created. This would make deleting a very very large RBD take a very very
long time.

I wouldn't have though the ceph designers would have put in a hard limit like
that just to protect people from a long time to delete.

You are free to disable the object-map when creating large images by
specifying the image-features -- or you can increase the object size
from its default 4MiB allocation size (which is honestly really no
different from QCOW2 switching increasing the backing cluster size as
the image grows larger).

The issue is that the size for the object-map for a 1PiB image w/ 4MiB
objects is going to be 268,435,456 backing objects which will require
64MiB of memory to store. It also just so happens that Ceph has a
hard-limit on the maximum object size of around 90MiB if I recall
correctly.

Is the whole object-map memory all held "in-core" the whole time, or is it retrieved / freed as needed (for a busy filesystem that might mean it's in-core practically the whole time, but for a quite filesystem maybe not so much).

In this case the server has 192G RAM so 64MiB doesn't sound so scary.

By the way - does that 64MiB also match approx how much fast/replicated storage I'll need if the data is on an ec volume?

I'm looking at 16M or larger objects, however I'm concerned about fragmentation and how that might affect the "thinness" of the volume. It seems with larger objects, and in the face of file removals in the upper filesystem, with many files less than the object size, there's far more chance for many objects to be significantly (but not completely) empty, blowing out the actual storage used compared to the logical storage used in the upper filesystem. Trimming won't help for partially allocated objects.

Actually, maybe that's a good reason to not create a humongous fs in the first place.

The XFS devs seem comfortable with growing a fs "a bit", e.g. 2-5 times original size, but seemingly at about 10 times it's getting a bit dodgy.

So, creating a smaller fs in the first place (with the expectation it may grow to 2-5 times original) means the fs itself will be encouraged to reuse the space in larger objects rather than spreading itself out and leaving a large number of partially filled objects.

I don't know if the amount written affects the rbd removal time.

When the object-map is enabled, only written data extents need to be
deleted. W/o the object-map, it would need to issue deletes against
all possible objects.

How does the lack of an object-map affect trimming? That's a very important factor for a large thin volume like this.

Thanks,

Chris
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux