I'm spent some time this week updating the current IO code to use the new extent and blob structures. The result is in https://github.com/ceph/ceph/pull/8928 It builds, and runs through the ceph_test_objectstore Synthetic test for a a few thousand iterations before blowing up. There are bugs in the COW-related code that I haven't tracked down, and I'm not sure it's worth doing that since most of it will get rewritten again anyway. This is based on Igor's original type patch, with some cleanups and updates (e.g., the bluestore_blob_t::map and mapbl helpers). I'm flipping back and forth between looking over how the current write path is structures (allocate everything, squirrel away some magic flags to guide cow behavior, iterate over allocated extents and write them out) vs how Igor's ExtentManager is structured (punch lextent holes, allocate blob, write into blob) and I'm not very happy with either one. The new code doesn't capture almost any of the hard parts the old one handled (when to COW vs WAL vs write to new extent, partial block updates, etc.). I think what would make the most sense is a simple breakdown of the write into three parts: middle - the portion of the write that is min_alloc_size aligned and can be written to a fresh region of disk. this is the easy part. front - anything before middle that must either wal or cow+wal tail - anything after middle that must either wal or cow+wal (or take a special append path) Then the process would be something like - separate into the 3 regions - prepare blobs, lextents, and wal events for each region - allocate pextents for any new blobs - deref old blobs - submit io That way the second step can do the compression and we'll end up with a list of new blobs and their associated buffers. either we allocate the full size for a raw write, or compress and allocate something smaller. The lextent+blob representation is a lot more flexible than what we had before, allowing things like zero() and truncate() to be trivial updates of the lextent map. The question is whether we want to allow sparse, byte-granuarlity lextent -> blob mappings. It'll make the code a bit more complex when deciding whether we can write data into an existing blob. (OTOH, I think we have to have much of that anyway.) I experimented a bit with a Checksummer class that captures what the ChecksumInterface was describing and plugs in crc32c and xxhash32 (so far). Not sure yet if we should add blob_t methods that use it directly (it has all the csum_data and related fields, so it'd be easier to use that way). Anyway, I think there are a couple of ways to proceed... - the read path is unrelated to any of the write complexities--it just needs to faithfully return data based on the extent/blob structures. We can focus on structuring that nicely, since it's a simpler case. - I added a ref_map to blob_t to track which portions of a blob are still referenced (so that parts of it can be deallocated, or we can split, or whatever). Nothing in place to do that, though.. we'll want something like ExtentManager::deref_blob, I think. - get an ExtentManager-like interface in place so that it is easier to experiment with read/write/truncate strategies. i'm still not convinced we need the block_* methods if all IO is planned in a structured way before being submitted. Igor, I think you said you're back from vacation next week? Let's touch base on Monday to make a plan? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html