Yeah , agreed. I forgot user can write any amount of omap data. We will discuss internally how we can handle that with ZS. Thanks & Regards Somnath -----Original Message----- From: Sage Weil [mailto:sage@xxxxxxxxxxxx] Sent: Tuesday, July 12, 2016 9:57 AM To: Somnath Roy Cc: Mark Nelson; Igor Fedotov; ceph-devel Subject: RE: bluestore onode diet and encoding overhead On Tue, 12 Jul 2016, Sage Weil wrote: > On Tue, 12 Jul 2016, Somnath Roy wrote: > > Mark, > > Recently, the default allocator is changed to Bitmap and I saw it is > > returning < 0 return value only in the following case. > > > > count = m_bit_alloc->alloc_blocks_res(nblks, &start_blk); > > if (count == 0) { > > return -ENOSPC; > > } > > > > So, it seems it may not be the memory but db partition is getting > > out of space (?). I never faced it so far as I was running with > > 100GB of db partition may be. The amount of metadata write going on > > to the db even after onode diet is starting from ~1K and over time > > it is reaching > 4k or so (I checked for 4K RW). It is growing as > > extents are growing. So, 8 GB may not be enough. If this is true, > > next challenge is , how to automatically (or document) the size of > > rocksdb db partition based on the data partition size. For example, > > in the ZS case, we have calculated that we need ~9G db space per TB. > > We need to do similar calculation for rocksbd as well. > > We can precalculate or otherwise pre-size the db partition because we ^ can't > don't know what kind of data the user is going to store, and that data > might even be 100% omap. This is why BlueStore and BlueFS balance > their free space--so that the bluefs/db usage can grow and shrink > dynamically as needed. > > We'll need to implement something similar for ZS. > > sage PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html