On Mon, Nov 27, 2017 at 5:52 PM, Igor Fedotov <ifedotov@xxxxxxx> wrote: > Hi Cephers, > > a while ago Ceph got a PR#15199 (https://github.com/ceph/ceph/pull/15199) > that improved object space usage tracking. > > It introduced 'dirty' extent map (in fact - interval set) in an attempt to > track the real space used by the object considering overwrites, holes etc. > > This extent map is kept at OSD level within object_info_t structure that in > turn utilizes object attributes as a persistent storage. > > This is a step forward comparing to the initial implementation relied on > maximum object offset. > > However IMO there is a couple of significant issues related to this patch: > > 1) Extent tracking might be pretty expensive. Each write operation triggers > both extent map encoding and its update at object store level. It takes 16 > bytes (offset+length) per single extent that easily results in submitting > several Kb of additional data per write for scattered objects. For BlueStore > this additional data weigh on already overburden KV store. Moreover this > tracking mechanics totally duplicates existing stuff - at least for > BlueStore which tracks 'dirty' extent map on its own. > > 2) The approach is still inaccurate. It tracks logical space not the actual > physical allocation. E.g. single byte write might result in 4K physical > space allocation. > > My suggestion is to refactor this mechanics and force Object Store to track > and report back real object space usage. This looks easily doable and more > accurate for BlueStore. And we can either rollback to the original > implementation or migrate that new PR#15199 approach to FileStore. > > What do you think? I haven't looked at the existing implementation in detail, but extending the ObjectStore API to support this makes a lot of sense to me. As you say, BlueStore ought to be able to just do it. FileStore will need more work, but hopefully we can push down the extent tracking without imposing new IO costs? -Greg > > Thanks, > > Igor > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html