Thanks for all this information. We are running version 16.2.7, but we also had this issue before upgrading to pacific. We are using the default value for bluestore_min_alloc_size(_hdd) and are currently redeploying every osd with a new 4TB hdd. The confusing part is that the space was already used before we actively started using the S3 service. > * If you’ve ever run `rados bench` bench against any of your pools, there may be a bunch of leftover RADOS objects laying around taking up space. By default something like `rados ls -p my pool | egrep ‘^bench.*$` will show these. Note that this may take a long time to run, and if the `rados bench` invocation specified a non-default job name the pattern may be different. I did run rados bench in the past but I cannot find any leftovers. In the past I changed many things as I was playing around with the cluster. Wouldn’t all those described issues lead to the usage being displayed in ceph df? I have 20TiB used as of now but all pools combined only use a little more than 16TiB. Thanks, Hendrik > On 10. Apr 2022, at 10:17, Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote: > >>> >>> Which version of Ceph was this deployed on? Did you use the default >>> value for bluestore_min_alloc_size(_hdd)? If it's before Pacific and >>> you used the default, then the min alloc size is 64KiB for HDDs, which >>> could be causing quite a bit of usage inflation depending on the sizes >>> of objects involved. >>> >> >> Is it recommended that if you have a pre-pacific cluster you change this now before upgrading? > > It’s baked into a given OSD at creation time. Changing after the fact should have no effect unless you rebuild affected OSDs. > > As noted above, significant space amplification can happen with RGW when storing a significant fraction of relatively small objects. > > This sheet quantifies and visualizes this phenomenon nicely: > > https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI > > If your OSDs were deployed with bluestore_min_alloc_size=16KB, S3/Swift objects that aren’t roughly an even multiple of 16KB in size will allocate unused space. Think remainder in a modulus operation. Eg., if you write a 1KB object, BlueStore will store 16KB and you’ll waste 15KB. If you write a 15KB object, the percentage is much lower. If you write a 17KB object, the space amplification ratchets up somewhat — 17 mod 16 => remainder 1KB, but in that case you’ve also stored a full 17KB object so the _percentage_ of stranded space is lower. This rapidly becomes insigificant as S3 object size increases. > > Note that this multiplied by replication. With 3R, the total stranded space will be 3x the remainder. With EC, depending on K and M, the total is potentially much larger since the client object is shared over a larger number of RADOS objects and thus OSDs. > > There is a doc PR already in progress that explains this phenomenon. > > If your population / distribution of objects is rich in relatively small objects, you can reclaim space by iteratively destroying and redeploying OSDs that were created with the larger value. > > RBD volumes tend to be much larger than min_alloc_size*, so this phenomenon is generally not significant for RBD pools. > > > Other factors that may be at play here: > > * Your OSDs at 600MB are small by Ceph standards, we’ve seen in the past that this can result in a relatively large ratio of overhead to raw / payload capacity. > > * ISTR having read that versioned objects / buckets and resharding operations can in some situations leave orphaned RADOS objects > > * If you’ve ever run `rados bench` bench against any of your pools, there may be a bunch of leftover RADOS objects laying around taking up space. By default something like `rados ls -p my pool | egrep ‘^bench.*$` will show these. Note that this may take a long time to run, and if the `rados bench` invocation specified a non-default job name the pattern may be different. > > — aad > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx