Re: Archive in Ceph similar to Hadoop Archive Utility (HAR)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Den fre 25 feb. 2022 kl 08:49 skrev Anthony D'Atri <anthony.datri@xxxxxxxxx>:
> There was a similar discussion last year around Software Heritage’s archive project, suggest digging up that thread.
> Some ideas:
>
> * Pack them into (optionally compressed) tarballs - from a quick search it sorta looks like HAR uses a similar model.  Store the tarballs as RGW objects, or as RBD volumes, or on CephFS.

After doing several different kinds of storage solutions in my career,
this above advice is REALLY important. Many hard to solve problems
have started out with "it is just one million files/objects" and when
you reach 50 and sound the alarm, people try to throw money at the
problem instead, and then you reach 2-3-400M and then you can't ask
for the index in finite time without it being invalid by the time the
list is complete.

If you have a possibility to stick 10,100,1000 small items into a
.tar, into a .zip, into whatever, DO IT. Do it before the numbers grow
too large to handle. When the numbers grow too big, you seldom get the
chance to both keep running in the too-large setup AND re-pack them at
the same time.

-- 
May the most significant bit of your life be positive.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux