RE: bluestore onode diet and encoding overhead

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 12 Jul 2016 16:57:07 +0000 (UTC)



On Tue, 12 Jul 2016, Sage Weil wrote:
> On Tue, 12 Jul 2016, Somnath Roy wrote:
> > Mark,
> > Recently, the default allocator is changed to Bitmap and I saw it is 
> > returning < 0 return value only in the following case.
> > 
> >   count = m_bit_alloc->alloc_blocks_res(nblks, &start_blk);
> >   if (count == 0) {
> >     return -ENOSPC;
> >   }
> > 
> > So, it seems it may not be the memory but db partition is getting out of 
> > space (?). I never faced it so far as I was running with 100GB of db 
> > partition may be. The amount of metadata write going on to the db even 
> > after onode diet is starting from ~1K and over time it is reaching > 4k 
> > or so (I checked for 4K RW). It is growing as extents are growing. So, 8 
> > GB may not be enough. If this is true, next challenge is , how to 
> > automatically (or document) the size of rocksdb db partition based on 
> > the data partition size. For example, in the ZS case, we have calculated 
> > that we need ~9G db space per TB. We need to do similar calculation for 
> > rocksbd as well.
> 
> We can precalculate or otherwise pre-size the db partition because we 
     ^
     can't

> don't know what kind of data the user is going to store, and that data 
> might even be 100% omap.  This is why BlueStore and BlueFS balance their 
> free space--so that the bluefs/db usage can grow and shrink dynamically as 
> needed.
> 
> We'll need to implement something similar for ZS.
> 
> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html