> -----Original Message----- > From: Snyder, Emile [mailto:emsnyder@xxxxxxxx] > Sent: Friday, March 11, 2016 1:59 PM > To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx> > Cc: ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: BlueStore Allocator. > > On 3/10/16, 6:06 PM, "ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of Allen > Samuels" <ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of > Allen.Samuels@xxxxxxxxxxx> wrote: > > <snip> > > > >Thus the total allocator system becomes: > > > >(1) An in-memory bitmap vector of 4K blocks (this is the equivalent of the > FreeList of the current design). > >(2) The auxiliary structures described above (~1MB of memory extra). > >(3) Allocations set bits in the bitmap at the time of allocation. > >(4) Deallocating operations are done after the KV commit. > >(5) KV keys become chunks of the bitmap and are manipulated at the > individual bit level via the merge capability of RocksDB (which we would > duplicate in ZetaScale). > > I take it you mean the KV *values* become chunks of the freelist bitmap? > While the key is... the index to the chunk of bitmap the value is? Since the > merge operation is modifying the value for a given key. Yes and yes. Exactly. > > How are you deciding the size of bitmap chunks to use? Good question. I think there are a couple of competing factors, but that it won't be difficult to find a good size. No reason that the code won't be parameterized so as to easily allow some experimentation and measurement. Factors that seem relevant to me are: (1) Not so large as to expand the KV transaction commit size into a larger number of I/O's on the device. In other words, make it small. We should probably plan on at least two Keys per transaction (1 alloc, 1 free) (2) Not so small as to require multiple keys to be specified because your contiguous allocate or free now spans multiple keys. (3) Not so large as to dramatically increase the amount of background compaction activity of RocksDB. With that said, my first guess would be something that allows 4MB allocations to be within a single Key. That would be 2^22/2^12 => 2^10 Bits => 2^7 Bytes. So my first guess is 128 byte values. That seems to me to satisfy the above criteria. > > -Emile Snyder ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f