Your suggesting a logical address cache key (oid offset) rather rhan a physical cache (lba). Which seems fine to me. Provided that deletes and renames properly purge the cache. Sent from my iPhone. Please excuse all typos and autocorrects. > On Aug 24, 2016, at 6:29 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: > >> On Wed, 24 Aug 2016, Allen Samuels wrote: >> Yikes. You mean that blob ids are escaping the environment of the >> lextent table. That's scary. What is the key for this cache? We probably >> need to invalidate it or something. > > I mean that there will no longer be blob ids (except within the encoding > of a particular extent map shard). Which means that when you write to A, > clone A->B, and then read B, B's blob will no longer be the same as A's > blob (as it is now in the bnode, or would have been with the -blobwise > branch) and the cache won't be preserved. > > Which I *think* is okay...? > > sage > > >> >> Sent from my iPhone. Please excuse all typos and autocorrects. >> >>> On Aug 24, 2016, at 5:18 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: >>> >>> On Wed, 24 Aug 2016, Allen Samuels wrote: >>>>> In that case, we should focus instead on sharing the ref_map *only* and >>>>> always inline the forward pointers for the blob. This is closer to what >>>>> we were originally doing with the enode. In fact, we could go back to the >>>>> enode approach were it's just a big extent_ref_map and only used to defer >>>>> deallocations until all refs are retired. The blob is then more ephemeral >>>>> (local to the onode, immutable copy if cloned), and we can more easily >>>>> rejigger how we store it. >>>>> >>>>> We'd still have a "ref map" type structure for the blob, but it would only >>>>> be used for counting the lextents that reference it, and we can >>>>> dynamically build it when we load the extent map. If we impose the >>>>> restriction that whatever the map sharding approach we take we never share >>>>> a blob across a shard, we the blobs are always local and "ephemeral" >>>>> in the sense we've been talking about. The only downside here, I think, >>>>> is that the write path needs to be smart enough to not create any new blob >>>>> that spans whatever the current map sharding is (or, alternatively, >>>>> trigger a resharding if it does so). >>>> >>>> Not just a resharding but also a possible decompress recompress cycle. >>> >>> Yeah. >>> >>> Oh, the other consequence of this is that we lose the unified blob-wise >>> cache behavior we added a while back. That means that if you write a >>> bunch of data to a rbd data object, then clone it, then read of the clone, >>> it'll re-read the data from disk. Because it'll be a different blob in >>> memory (since we'll be making a copy of the metadata etc). >>> >>> Josh, Jason, do you have a sense of whether that really matters? The >>> common case is probably someone who creates a snapshot and then backs it >>> up, but it's going to be reading gobs of cold data off disk anyway so I'm >>> guessing it doesn't matter that a bit of warm data that just preceded the >>> snapshot gets re-read. >>> >>> sage >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html