Hey Igor, I don't know much about the BlueStore allocator pattern, so I don't have a clear idea how difficult this is. But I *believe* we have a common pattern in RBD that might be worth optimizing for: the repeated-overwrites case. Often this would be some kind of journal header — either for the FS stored on top, a database, or whatever, that results in the same 4KB logical block getting overwritten repeatedly. For instance, librbd might write out AAAA to an object, then do updates to the second block resulting in a logical ABAA ACAA ADAA etc. I think, from my very limited understanding and what I heard when I asked this in standup, that right now the layout in BlueStore for this will tend to be something like AAAA A[A]AA...B A[A]AA...[B]...C A[A]AA...[B]...[C]...D where the brackets indicate a deallocated [hole]. I expect that to happen (certainly for the first overwrite) as long as the incoming IO is large enough to trigger an immediate write to disk and then an update to the metadata, rather than stuffing the data in the WAL and then doing a write-in-place. So I wonder: is there any optimization to try and place incoming data so that it closes up holes and allows merging the extents/blobs (sorry, I forget the BlueStore internal terms)? If not, is this a feasible optimization to try and apply at some point? That way we could get an on disk layout pattern more like AAAA A[A]AA...B ACAA...[B] A[C]AA...D I don't know what the full value of something like this would actually be, but I was in some discussion recently where it came up that RBD causes much larger RocksDB usage than RGW does, thanks to the fragmented layouts it provokes. Cutting that down might be very good for our long-term performance? -Greg