I’d be nervous about a plan to utilize a single volume, growing indefinitely. I would think that from a blast radius perspective that you’d want to strike a balance between a single monolithic blockchain-style volume vs a zillion tiny files. Perhaps a strategy to shard into, say, 10 TB volumes. That size is large enough to hold lots of immutable code yet not so unweildy that it becomes infeasible to manage. > Packing's obviously a good idea for storing these kinds of artifacts > in Ceph, and hacking through the existing librbd might indeed be > easier than building something up from raw RADOS, especially if you > want to use stuff like rbd-mirror. > > My main concern would just be as Dan points out, that we don't test > rbd with extremely large images and we know deleting that image will > take a looooong time — I don't know of other issues off the top of my > head, and in the worst case you could always fall back to manipulating > it with raw librados if there is an issue. > > But you might also check in on the status of Danny Al-Gaaf's rados > email project. Email and these artifacts seemingly have a lot in > common. > -Greg > > On Mon, Feb 1, 2021 at 12:52 PM Loïc Dachary <loic@xxxxxxxxxxx> wrote: >> >> Hi Dan, >> >> On 01/02/2021 21:13, Dan van der Ster wrote: >>> Hi Loïc, >>> >>> We've never managed 100TB+ in a single RBD volume. I can't think of >>> anything, but perhaps there are some unknown limitations when they get so >>> big. >>> It should be easy enough to use rbd bench to create and fill a massive test >>> image to validate everything works well at that size. >> Good idea! I'll look for a cluster with 100TB of free space and post my findings. >>> >>> Also, I assume you'll be doing the IO from just one client? Multiple >>> readers/writers to a single volume could get complicated. >> Yes. >>> >>> Otherwise, yes RBD sounds very convenient for what you need. >> It is inspired by https://static.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf which suggests an ad-hoc implementation to pack immutable objects together. But I think RBD already provides the underlying logic, even though it is not specialized for this use case. RGW also packs small objects together and would be a good candidate. But it provides more flexibility to modify/delete objects and I assume it will be slower to write N objects with RGW than to write them sequentially on an RBD image. But I did not try and maybe I should. >> >> To be continued. >>> >>> Cheers, Dan >>> >>> >>> On Sat, Jan 30, 2021, 4:01 PM Loïc Dachary <loic@xxxxxxxxxxx> wrote: >>> >>>> Bonjour, >>>> >>>> In the context Software Heritage (a noble mission to preserve all source >>>> code)[0], artifacts have an average size of ~3KB and there are billions of >>>> them. They never change and are never deleted. To save space it would make >>>> sense to write them, one after the other, in an every growing RBD volume >>>> (more than 100TB). An index, located somewhere else, would record the >>>> offset and size of the artifacts in the volume. >>>> >>>> I wonder if someone already implemented this idea with success? And if >>>> not... does anyone see a reason why it would be a bad idea? >>>> >>>> Cheers >>>> >>>> [0] https://docs.softwareheritage.org/ >>>> >>>> -- >>>> Loïc Dachary, Artisan Logiciel Libre >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> -- >> Loïc Dachary, Artisan Logiciel Libre >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx