Re: Using RBD to pack billions of small files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Packing's obviously a good idea for storing these kinds of artifacts
in Ceph, and hacking through the existing librbd might indeed be
easier than building something up from raw RADOS, especially if you
want to use stuff like rbd-mirror.

My main concern would just be as Dan points out, that we don't test
rbd with extremely large images and we know deleting that image will
take a looooong time — I don't know of other issues off the top of my
head, and in the worst case you could always fall back to manipulating
it with raw librados if there is an issue.

But you might also check in on the status of Danny Al-Gaaf's rados
email project. Email and these artifacts seemingly have a lot in
common.
-Greg

On Mon, Feb 1, 2021 at 12:52 PM Loïc Dachary <loic@xxxxxxxxxxx> wrote:
>
> Hi Dan,
>
> On 01/02/2021 21:13, Dan van der Ster wrote:
> > Hi Loïc,
> >
> > We've never managed 100TB+ in a single RBD volume. I can't think of
> > anything, but perhaps there are some unknown limitations when they get so
> > big.
> > It should be easy enough to use rbd bench to create and fill a massive test
> > image to validate everything works well at that size.
> Good idea! I'll look for a cluster with 100TB of free space and post my findings.
> >
> > Also, I assume you'll be doing the IO from just one client? Multiple
> > readers/writers to a single volume could get complicated.
> Yes.
> >
> > Otherwise, yes RBD sounds very convenient for what you need.
> It is inspired by https://static.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf which suggests an ad-hoc implementation to pack immutable objects together. But I think RBD already provides the underlying logic, even though it is not specialized for this use case. RGW also packs small objects together and would be a good candidate. But it provides more flexibility to modify/delete objects and I assume it will be slower to write N objects with RGW than to write them sequentially on an RBD image. But I did not try and maybe I should.
>
> To be continued.
> >
> > Cheers, Dan
> >
> >
> > On Sat, Jan 30, 2021, 4:01 PM Loïc Dachary <loic@xxxxxxxxxxx> wrote:
> >
> >> Bonjour,
> >>
> >> In the context Software Heritage (a noble mission to preserve all source
> >> code)[0], artifacts have an average size of ~3KB and there are billions of
> >> them. They never change and are never deleted. To save space it would make
> >> sense to write them, one after the other, in an every growing RBD volume
> >> (more than 100TB). An index, located somewhere else, would record the
> >> offset and size of the artifacts in the volume.
> >>
> >> I wonder if someone already implemented this idea with success? And if
> >> not... does anyone see a reason why it would be a bad idea?
> >>
> >> Cheers
> >>
> >> [0] https://docs.softwareheritage.org/
> >>
> >> --
> >> Loïc Dachary, Artisan Logiciel Libre
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux