Re: Bug in mempool::map?

Igor Fedotov <ifedotov@xxxxxxxxxxxx> · Thu, 22 Dec 2016 01:55:02 +0300

On 12/22/2016 12:43 AM, Sage Weil wrote:
On Wed, 21 Dec 2016, Igor Fedotov wrote:
On 12/20/2016 11:15 PM, Sage Weil wrote:
It seems like we have a few common cases...

1) big blob, single ref (e.g., immutable large object, and then the entire
thing is referenced, modulo the last block).

2) tiny blob, single alloc unit.  a counter is sufficient.

3) everything else in between...

We could probably get away with special-case efficient representations of
just 1 and 2?
I'd say 1) = entire blob is referenced just once. And 2) = single alloc unit
and partially referenced.

Then we need at minimum a bit flag for 1), 'int' for 2) and vector<int> for
3).

Possible implementations are
A.{
int standalone_counter;  //this covers 1 & 2, value greater than alloc unit
means 1)
vector<int> per_au_counters; // and this one for 3 with the first counter
stored in standalone_counter
}
vs.
B.{
//reuse some blob flag for 1)
vector<int> per_au_counters; //for 2) and 3)
}
vs.
C.{
vector<int> per_au_counters; //for all cases
}

A. Saves one allocation for 1 & 2, wastes 4 bytes for 1. Plus requires some
simple tricks to use both standalone counter and vector.
B. Saves 1 allocation and 4 bytes for 1. Unified counter vector processing but
some minor tricks with a flag.
C. Needs +1 allocation for 1 & 2. And wastes 4 bytes for 1. Unified counter
processing

I'd rather prefer B. implementation.
vector<int> is 24 bytes.  So if we're worried about memory, I think we
Yep, missed that...
want to have an alternative strategy.  Something like:

  uint8_t au_order;  ///< size of allocation unit (min_alloc_size may change)
Unless we pack the struct this byte field adds +8 bytes to struct size 
for 64-bit build. Another point is heap allocator granularity - most 
probably it will align allocated struct to 8 byte boundaries anyway. 
Hence we can use a full value instead of au_order.

Another concern is min_alloc_size change in general - have we ever 
performed any robust BlueStore testing against this feature?
It looks pretty dangerous and error-prone...

  union {
    long *ref_by_au;  ///< dynamically allocated array of bytes per au
    long ref_in_single_au;  ///< refs if blob is a single au (au_order=0?)
  }

Something simple like that?

I suspect get_ref/put_ref will get quite a bit simpler and more efficient
with per-au buckets.
yep.
Also, for compressed blobs, we can use the ref_in_single_au mode since we
can't deallocate it piecewise anyway...
+1

Also please note that we'll probably impact our current Onode encoding 
with this change. Spanning blobs need ref counters to be persistent....

Thanks,
Igor
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html