Re: Using RBD to pack billions of small files

Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Wed, 3 Feb 2021 09:54:26 +0100

Hi,

On 2/3/21 9:41 AM, Loïc Dachary wrote:
Just my 2 cents:

You could use the first byte of the SHA sum to identify the image, e.g. using a fixed number of 256 images. Or some flexible approach similar to the way filestore used to store rados objects.
A friend suggested the same to save space. Good idea.

If you want to further reduce the index size, you can just store the 
offset, and the first 4? 8? bytes at that offset define the size of the 
following artifacts. That's similar to the way Pascal used to store 
strings in the good ol' times. You might also want to think about using 
a complete header which also includes the artifact's name etc. This will 
allow you to rebuild the index if it becomes corrupted. The storage 
overhead should be insignificant

Your index will become a simple mapping of SHA sum -> offset, and you 
might also be able to use optimized implementations.

Regards,

Burkhard

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx