Re: Archive in Ceph similar to Hadoop Archive Utility (HAR)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You bet, glad to help.  

Zillions of small files indeed present a relatively higher metadata overhead, and can be problematic in multiple ways.  When using RGW, indexless buckets may be advantageous.  

Another phenomenon is space amplification — with say a 1 GB file/object, a partially full last allocated block is a trivial amount of wasted space, sometimes called internal fragmentation.  As the files get smaller, this becomes an increasingly larger ratio. 

Mark’s sheet is terrific for visualizing this:

https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit?usp=sharing

Work was done a couple of releases ago to allow lowering the default min_alloc_size because of the inefficiency with small RGW objects especially.  A subtle additional factor that is often missed is that RADOS writes full stripes, adding another layer of potential incremental wasted space that can be increased by misaligned / larger EC profiles vs replication.  


> On Feb 25, 2022, at 4:18 AM, Bobby <italienisch1987@xxxxxxxxx> wrote:
> 
> 
> 
> thanks Anthony and Janne....exactly what I have been looking for!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux