Packing's obviously a good idea for storing these kinds of artifacts in Ceph, and hacking through the existing librbd might indeed be easier than building something up from raw RADOS, especially if you want to use stuff like rbd-mirror. My main concern would just be as Dan points out, that we don't test rbd with extremely large images and we know deleting that image will take a looooong time — I don't know of other issues off the top of my head, and in the worst case you could always fall back to manipulating it with raw librados if there is an issue. But you might also check in on the status of Danny Al-Gaaf's rados email project. Email and these artifacts seemingly have a lot in common. -Greg On Mon, Feb 1, 2021 at 12:52 PM Loïc Dachary <loic@xxxxxxxxxxx> wrote: > > Hi Dan, > > On 01/02/2021 21:13, Dan van der Ster wrote: > > Hi Loïc, > > > > We've never managed 100TB+ in a single RBD volume. I can't think of > > anything, but perhaps there are some unknown limitations when they get so > > big. > > It should be easy enough to use rbd bench to create and fill a massive test > > image to validate everything works well at that size. > Good idea! I'll look for a cluster with 100TB of free space and post my findings. > > > > Also, I assume you'll be doing the IO from just one client? Multiple > > readers/writers to a single volume could get complicated. > Yes. > > > > Otherwise, yes RBD sounds very convenient for what you need. > It is inspired by https://static.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf which suggests an ad-hoc implementation to pack immutable objects together. But I think RBD already provides the underlying logic, even though it is not specialized for this use case. RGW also packs small objects together and would be a good candidate. But it provides more flexibility to modify/delete objects and I assume it will be slower to write N objects with RGW than to write them sequentially on an RBD image. But I did not try and maybe I should. > > To be continued. > > > > Cheers, Dan > > > > > > On Sat, Jan 30, 2021, 4:01 PM Loïc Dachary <loic@xxxxxxxxxxx> wrote: > > > >> Bonjour, > >> > >> In the context Software Heritage (a noble mission to preserve all source > >> code)[0], artifacts have an average size of ~3KB and there are billions of > >> them. They never change and are never deleted. To save space it would make > >> sense to write them, one after the other, in an every growing RBD volume > >> (more than 100TB). An index, located somewhere else, would record the > >> offset and size of the artifacts in the volume. > >> > >> I wonder if someone already implemented this idea with success? And if > >> not... does anyone see a reason why it would be a bad idea? > >> > >> Cheers > >> > >> [0] https://docs.softwareheritage.org/ > >> > >> -- > >> Loïc Dachary, Artisan Logiciel Libre > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > -- > Loïc Dachary, Artisan Logiciel Libre > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx