On 12/22/2016 12:43 AM, Sage Weil wrote:
On Wed, 21 Dec 2016, Igor Fedotov wrote:
On 12/20/2016 11:15 PM, Sage Weil wrote:
It seems like we have a few common cases...
1) big blob, single ref (e.g., immutable large object, and then the entire
thing is referenced, modulo the last block).
2) tiny blob, single alloc unit. a counter is sufficient.
3) everything else in between...
We could probably get away with special-case efficient representations of
just 1 and 2?
I'd say 1) = entire blob is referenced just once. And 2) = single alloc unit
and partially referenced.
Then we need at minimum a bit flag for 1), 'int' for 2) and vector<int> for
3).
Possible implementations are
A.{
int standalone_counter; //this covers 1 & 2, value greater than alloc unit
means 1)
vector<int> per_au_counters; // and this one for 3 with the first counter
stored in standalone_counter
}
vs.
B.{
//reuse some blob flag for 1)
vector<int> per_au_counters; //for 2) and 3)
}
vs.
C.{
vector<int> per_au_counters; //for all cases
}
A. Saves one allocation for 1 & 2, wastes 4 bytes for 1. Plus requires some
simple tricks to use both standalone counter and vector.
B. Saves 1 allocation and 4 bytes for 1. Unified counter vector processing but
some minor tricks with a flag.
C. Needs +1 allocation for 1 & 2. And wastes 4 bytes for 1. Unified counter
processing
I'd rather prefer B. implementation.
vector<int> is 24 bytes. So if we're worried about memory, I think we
Yep, missed that...
want to have an alternative strategy. Something like:
uint8_t au_order; ///< size of allocation unit (min_alloc_size may change)
Unless we pack the struct this byte field adds +8 bytes to struct size
for 64-bit build. Another point is heap allocator granularity - most
probably it will align allocated struct to 8 byte boundaries anyway.
Hence we can use a full value instead of au_order.
Another concern is min_alloc_size change in general - have we ever
performed any robust BlueStore testing against this feature?
It looks pretty dangerous and error-prone...
union {
long *ref_by_au; ///< dynamically allocated array of bytes per au
long ref_in_single_au; ///< refs if blob is a single au (au_order=0?)
}
Something simple like that?
I suspect get_ref/put_ref will get quite a bit simpler and more efficient
with per-au buckets.
yep.
Also, for compressed blobs, we can use the ref_in_single_au mode since we
can't deallocate it piecewise anyway...
+1
Also please note that we'll probably impact our current Onode encoding
with this change. Spanning blobs need ref counters to be persistent....
Thanks,
Igor
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html