Re: bluestore blobs REVISITED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 24 Aug 2016, Allen Samuels wrote:
> > In that case, we should focus instead on sharing the ref_map *only* and 
> > always inline the forward pointers for the blob.  This is closer to what 
> > we were originally doing with the enode.  In fact, we could go back to the 
> > enode approach were it's just a big extent_ref_map and only used to defer 
> > deallocations until all refs are retired.  The blob is then more ephemeral 
> > (local to the onode, immutable copy if cloned), and we can more easily 
> > rejigger how we store it.
> > 
> > We'd still have a "ref map" type structure for the blob, but it would only 
> > be used for counting the lextents that reference it, and we can 
> > dynamically build it when we load the extent map.  If we impose the 
> > restriction that whatever the map sharding approach we take we never share 
> > a blob across a shard, we the blobs are always local and "ephemeral" 
> > in the sense we've been talking about.  The only downside here, I think, 
> > is that the write path needs to be smart enough to not create any new blob 
> > that spans whatever the current map sharding is (or, alternatively, 
> > trigger a resharding if it does so).
> 
> Not just a resharding but also a possible decompress recompress cycle. 

Yeah.

Oh, the other consequence of this is that we lose the unified blob-wise 
cache behavior we added a while back.  That means that if you write a 
bunch of data to a rbd data object, then clone it, then read of the clone, 
it'll re-read the data from disk.  Because it'll be a different blob in 
memory (since we'll be making a copy of the metadata etc).

Josh, Jason, do you have a sense of whether that really matters?  The 
common case is probably someone who creates a snapshot and then backs it 
up, but it's going to be reading gobs of cold data off disk anyway so I'm 
guessing it doesn't matter that a bit of warm data that just preceded the 
snapshot gets re-read.

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux