Re: Using RBD to pack billions of small files

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Tue, 2 Feb 2021 12:10:26 -0800

I’d be nervous about a plan to utilize a single volume, growing indefinitely.  I would think that from a blast radius perspective that you’d want to strike a balance between a single monolithic blockchain-style volume vs a zillion tiny files.  Perhaps a strategy to shard into, say, 10 TB volumes.  That size is large enough to hold lots of immutable code yet not so unweildy that it becomes infeasible to manage.

> Packing's obviously a good idea for storing these kinds of artifacts
> in Ceph, and hacking through the existing librbd might indeed be
> easier than building something up from raw RADOS, especially if you
> want to use stuff like rbd-mirror.
> 
> My main concern would just be as Dan points out, that we don't test
> rbd with extremely large images and we know deleting that image will
> take a looooong time — I don't know of other issues off the top of my
> head, and in the worst case you could always fall back to manipulating
> it with raw librados if there is an issue.
> 
> But you might also check in on the status of Danny Al-Gaaf's rados
> email project. Email and these artifacts seemingly have a lot in
> common.
> -Greg
> 
> On Mon, Feb 1, 2021 at 12:52 PM Loïc Dachary <loic@xxxxxxxxxxx> wrote:
>> 
>> Hi Dan,
>> 
>> On 01/02/2021 21:13, Dan van der Ster wrote:
>>> Hi Loïc,
>>> 
>>> We've never managed 100TB+ in a single RBD volume. I can't think of
>>> anything, but perhaps there are some unknown limitations when they get so
>>> big.
>>> It should be easy enough to use rbd bench to create and fill a massive test
>>> image to validate everything works well at that size.
>> Good idea! I'll look for a cluster with 100TB of free space and post my findings.
>>> 
>>> Also, I assume you'll be doing the IO from just one client? Multiple
>>> readers/writers to a single volume could get complicated.
>> Yes.
>>> 
>>> Otherwise, yes RBD sounds very convenient for what you need.
>> It is inspired by https://static.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf which suggests an ad-hoc implementation to pack immutable objects together. But I think RBD already provides the underlying logic, even though it is not specialized for this use case. RGW also packs small objects together and would be a good candidate. But it provides more flexibility to modify/delete objects and I assume it will be slower to write N objects with RGW than to write them sequentially on an RBD image. But I did not try and maybe I should.
>> 
>> To be continued.
>>> 
>>> Cheers, Dan
>>> 
>>> 
>>> On Sat, Jan 30, 2021, 4:01 PM Loïc Dachary <loic@xxxxxxxxxxx> wrote:
>>> 
>>>> Bonjour,
>>>> 
>>>> In the context Software Heritage (a noble mission to preserve all source
>>>> code)[0], artifacts have an average size of ~3KB and there are billions of
>>>> them. They never change and are never deleted. To save space it would make
>>>> sense to write them, one after the other, in an every growing RBD volume
>>>> (more than 100TB). An index, located somewhere else, would record the
>>>> offset and size of the artifacts in the volume.
>>>> 
>>>> I wonder if someone already implemented this idea with success? And if
>>>> not... does anyone see a reason why it would be a bad idea?
>>>> 
>>>> Cheers
>>>> 
>>>> [0] https://docs.softwareheritage.org/
>>>> 
>>>> --
>>>> Loïc Dachary, Artisan Logiciel Libre
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx