Hi, Sage Currently, allocation hint is set into filesystem fsxattr, like XFS. Or directly assign to onode_t in Blustore and Kstore. It works as expected normally. However, if the cluster begins to recovering or backfilling object. Those recovered object will lose allocation hint fsxattr forever. This means if a cluster experienced osd up/down flapping, then those kind of hot accessed object will not have the allocation hint xattr any more. As a result, those objects will not be fully written (1M/4M) eventually become fragmented. This will also happen if an object is truncated (become of enabling qume discard option). Therefore, we may want to preserve the allocation hint fsxattr during recovery, there are two alternatives to do this. 1. issue set_alloc_hint with the current object size during recovery, like t->set_alloc_hint(cid, oid, expected_object_size, expected_object_size) but it would be a problem for us to retrieve the expected_object_size. For a block image, it may be object_size in image metadata. 2. So I would like to directly transfer the information of hint fsxattr to the recovered object. Also, I find that a new param flags will be used to indicated whether the data will be compressed. So I propose a PR Like this: https://github.com/ceph/ceph/pull/9452 I think the issue may not be related to BlueStore or KStore because of the allocation hint attribute can be overwritten in later set_allocation_hint Ops? Regards Ning Yao -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html