> -----Original Message----- > From: Sage Weil [mailto:sage@xxxxxxxxxxxx] > Sent: Friday, December 16, 2016 2:33 PM > To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx> > Cc: Igor Fedotov <ifedotov@xxxxxxxxxxxx>; ceph-devel <ceph- > devel@xxxxxxxxxxxxxxx> > Subject: RE: BlueStore in-memory Onode footprint > > On Fri, 16 Dec 2016, Allen Samuels wrote: > > I'm not sure what the conclusion from this is. > > > > The point of the sharding exercise was to eliminate the need to > > serialize/deserialize all 1024 Extents/Blobs/SharedBlobs on each I/O > > transaction. > > > > This shows that a fully populated oNode with ALL of the shards present > > is a large number. But that ought to be a rare occurrence. > > > > This test shows that each Blob is 248 bytes and that each SharedBlob > > is > > 216 bytes. That matches the sizeof(...), so the MemPool logic got the > > right answer! Yay! > > > > Looking at the Blob I see: > > > > Bluestore_blob_t 72 bytes > > Bufferlist 88 bytes > > Extentrefmap 64 bytes > > > > That's most the 248. I suspect that trying to fix this will require a > > new strategy, etc. > > Well, one thing we might consider: right now the prune cache content at the > granularity of onode. We might want to put the shards in an LRU too and > prune old shards... Not sure I really see the value of that. Seems to me that Igor's test case is pretty rare. I think in this case, you don't need to work well -- you just need to work. As long as the pruning is based on the actual memory consumption rather than the oNode count I think we're fine. Realistically, retaining deserialized oNodes is only valuable for sequential read/write cases. In which case you don't need A large number of oNodes, just a few per client connection ought to do it. > > sage > > > > > Allen Samuels > > SanDisk |a Western Digital brand > > 2880 Junction Avenue, San Jose, CA 95134 > > T: +1 408 801 7030| M: +1 408 780 6416 allen.samuels@xxxxxxxxxxx > > > > > > > -----Original Message----- > > > From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel- > > > owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil > > > Sent: Friday, December 16, 2016 7:20 AM > > > To: Igor Fedotov <ifedotov@xxxxxxxxxxxx> > > > Cc: ceph-devel <ceph-devel@xxxxxxxxxxxxxxx> > > > Subject: Re: BlueStore in-memory Onode footprint > > > > > > On Fri, 16 Dec 2016, Igor Fedotov wrote: > > > > Hey All! > > > > > > > > Recently I realized that I'm unable to fit all my onodes ( 32768 > > > > objects/4Mb each/4K alloc unit/no csum) in 15G RAM cache. > > > > > > > > Hence decided to estimate Onode in-memory size. > > > > > > > > At first I filled 4Mb object with a single 4M write - mempools > > > > indicate ~5K mem usage for total onode metadata. Good enough. > > > > > > > > Secondly I refill that object with 4K writes. Resulting memusage = > 574K!!! > > > > Onode itself 704 bytes in 1 object. And 4120 other metadata items > > > > occupies all other space. > > > > > > > > Then I removed SharedBlob from mempools. Resulting mem usage = > 355K. > > > > The same Onode size. And 3096 other metadata objects. Hence we had > > > > 1024 SharedBlob instances thata took ~ 220K. > > > > > > > > > > > > And finally I removed Blob instances from measurements. Resulting > > > > mem usage = 99K. And 2072 other objects. Hence Blob instances take > > > > another ~250K > > > > > > Yikes! > > > > > > BTW you can get a full breakdown by type with 'mempool debug = true' > > > in ceph.conf (-o 'mempool debug = true' on vstart.sh command line) > > > without having to recompile. Do you mind repeating the test and > > > including the full breakdown? > > > > > > > Yeah, that's the worst case( actually csum enable will give even > > > > more mem use) but shouldn't we revisit some Onode internals due to > > > > such > > > numbers? > > > > > > Yep! > > > > > > sage > > > -- > > > To unsubscribe from this list: send the line "unsubscribe > > > ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html