Re: Using RBD to pack billions of small files

Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> · Sun, 31 Jan 2021 21:27:39 -0500

Dear Loïc ,

I do not have direct experience with this many files, but it resonates for
me with deduplication, such as borg (https://www.borgbackup.org/) or a
similar implementation in the latest Proxmox Backup Server (
https://pbs.proxmox.com/wiki/index.php/Main_Page).  I think you would need
a filesystem for either, so not sure how well this would integrate directly
with RBD, but maybe cephfs is an option?  I typically run zfs on top of
rbd, and use only zfs compression, and then put borg on top of zfs.  There
is overhead, but this is a very flexible setup, operationally.  All the
best in your endeavor!
--
Alex Gorbachev
ISS/Storcium

On Sat, Jan 30, 2021 at 10:01 AM Loïc Dachary <loic@xxxxxxxxxxx> wrote:

> Bonjour,
>
> In the context Software Heritage (a noble mission to preserve all source
> code)[0], artifacts have an average size of ~3KB and there are billions of
> them. They never change and are never deleted. To save space it would make
> sense to write them, one after the other, in an every growing RBD volume
> (more than 100TB). An index, located somewhere else, would record the
> offset and size of the artifacts in the volume.
>
> I wonder if someone already implemented this idea with success? And if
> not... does anyone see a reason why it would be a bad idea?
>
> Cheers
>
> [0] https://docs.softwareheritage.org/
>
> --
> Loïc Dachary, Artisan Logiciel Libre
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx