Re: Using RBD to pack billions of small files

Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> · Mon, 1 Feb 2021 14:18:04 -0500

Hi Loïc,

Does not borg need a file system to write its files to?  We do replicate
the chunks incrementally with rsync, and that is a very nice and,
importantly, idempotent way, to sync up data to a second site.

--
Alex Gorbachev
ISS/Storcium

On Mon, Feb 1, 2021 at 2:43 AM Loïc Dachary <loic@xxxxxxxxxxx> wrote:

> Hi Alex,
>
> Using borg would indeed make sense to copy the replicate the rbd content
> in case
> rbd-mirror is not an option, nice idea :-)
>
> Interestingly there is no need for a proper file system: the files are
> immutable and never
> deleted. They are indexed by the SHA256 of their content and a map where
> the key is
> the SHA256 and the value is the offset,size in the rbd image would be
> enough.
>
> Cheers
>
> On 01/02/2021 03:27, Alex Gorbachev wrote:
> > Dear Loïc ,
> >
> > I do not have direct experience with this many files, but it resonates
> for
> > me with deduplication, such as borg (https://www.borgbackup.org/) or a
> > similar implementation in the latest Proxmox Backup Server (
> > https://pbs.proxmox.com/wiki/index.php/Main_Page).  I think you would
> need
> > a filesystem for either, so not sure how well this would integrate
> directly
> > with RBD, but maybe cephfs is an option?  I typically run zfs on top of
> > rbd, and use only zfs compression, and then put borg on top of zfs.
> There
> > is overhead, but this is a very flexible setup, operationally.  All the
> > best in your endeavor!
> > --
> > Alex Gorbachev
> > ISS/Storcium
> >
> >
> >
> > On Sat, Jan 30, 2021 at 10:01 AM Loïc Dachary <loic@xxxxxxxxxxx> wrote:
> >
> >> Bonjour,
> >>
> >> In the context Software Heritage (a noble mission to preserve all source
> >> code)[0], artifacts have an average size of ~3KB and there are billions
> of
> >> them. They never change and are never deleted. To save space it would
> make
> >> sense to write them, one after the other, in an every growing RBD volume
> >> (more than 100TB). An index, located somewhere else, would record the
> >> offset and size of the artifacts in the volume.
> >>
> >> I wonder if someone already implemented this idea with success? And if
> >> not... does anyone see a reason why it would be a bad idea?
> >>
> >> Cheers
> >>
> >> [0] https://docs.softwareheritage.org/
> >>
> >> --
> >> Loïc Dachary, Artisan Logiciel Libre
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx