Re: Using RBD to pack billions of small files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matt,

I did not know about pixz, thanks for the pointer. The idea it implements is also new to me and it looks like it can
usefully be applied to this use case. I'm not going to say "awesome" because I can't grasp how useful it really is
right now. But I'll definitely think about it :-)

Cheers

On 03/02/2021 22:02, Matt Wilder wrote:
> If it were me, I would do something along the lines of:
>
> - Bundle larger blocks of code into pixz
> <https://github.com/vasi/pixz> (essentially
> indexed tar files, allowing random access) and store them in RadosGW.
> - Build a small frontend that fetches (with caching) them and provides the
> file contents via whatever your UI is.
>
> On Wed, Feb 3, 2021 at 12:55 AM Burkhard Linke <
> Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>> Hi,
>>
>> On 2/3/21 9:41 AM, Loïc Dachary wrote:
>>>> Just my 2 cents:
>>>>
>>>> You could use the first byte of the SHA sum to identify the image, e.g.
>> using a fixed number of 256 images. Or some flexible approach similar to
>> the way filestore used to store rados objects.
>>> A friend suggested the same to save space. Good idea.
>>
>> If you want to further reduce the index size, you can just store the
>> offset, and the first 4? 8? bytes at that offset define the size of the
>> following artifacts. That's similar to the way Pascal used to store
>> strings in the good ol' times. You might also want to think about using
>> a complete header which also includes the artifact's name etc. This will
>> allow you to rebuild the index if it becomes corrupted. The storage
>> overhead should be insignificant
>>
>> Your index will become a simple mapping of SHA sum -> offset, and you
>> might also be able to use optimized implementations.
>>
>>
>> Regards,
>>
>> Burkhard
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>

-- 
Loïc Dachary, Artisan Logiciel Libre


Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux