RE: BlueStore in-memory Onode footprint

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Sage Weil [mailto:sage@xxxxxxxxxxxx]
> Sent: Friday, December 16, 2016 2:33 PM
> To: Allen Samuels <Allen.Samuels@xxxxxxxxxxx>
> Cc: Igor Fedotov <ifedotov@xxxxxxxxxxxx>; ceph-devel <ceph-
> devel@xxxxxxxxxxxxxxx>
> Subject: RE: BlueStore in-memory Onode footprint
> 
> On Fri, 16 Dec 2016, Allen Samuels wrote:
> > I'm not sure what the conclusion from this is.
> >
> > The point of the sharding exercise was to eliminate the need to
> > serialize/deserialize all 1024 Extents/Blobs/SharedBlobs on each I/O
> > transaction.
> >
> > This shows that a fully populated oNode with ALL of the shards present
> > is a large number. But that ought to be a rare occurrence.
> >
> > This test shows that each Blob is 248 bytes and that each SharedBlob
> > is
> > 216 bytes. That matches the sizeof(...), so the MemPool logic got the
> > right answer! Yay!
> >
> > Looking at the Blob I see:
> >
> > Bluestore_blob_t 72 bytes
> > Bufferlist 88 bytes
> > Extentrefmap 64 bytes
> >
> > That's most the 248. I suspect that trying to fix this will require a
> > new strategy, etc.
> 
> Well, one thing we might consider: right now the prune cache content at the
> granularity of onode.  We might want to put the shards in an LRU too and
> prune old shards...

Not sure I really see the value of that. Seems to me that Igor's test case is pretty rare. 

I think in this case, you don't need to work well -- you just need to work.

As long as the pruning is based on the actual memory consumption rather than the oNode count I think we're fine.

Realistically, retaining deserialized oNodes is only valuable for sequential read/write cases. In which case you don't need 
A large number of oNodes, just a few per client connection ought to do it.

> 
> sage
> 
>  >
> > Allen Samuels
> > SanDisk |a Western Digital brand
> > 2880 Junction Avenue, San Jose, CA 95134
> > T: +1 408 801 7030| M: +1 408 780 6416 allen.samuels@xxxxxxxxxxx
> >
> >
> > > -----Original Message-----
> > > From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> > > owner@xxxxxxxxxxxxxxx] On Behalf Of Sage Weil
> > > Sent: Friday, December 16, 2016 7:20 AM
> > > To: Igor Fedotov <ifedotov@xxxxxxxxxxxx>
> > > Cc: ceph-devel <ceph-devel@xxxxxxxxxxxxxxx>
> > > Subject: Re: BlueStore in-memory Onode footprint
> > >
> > > On Fri, 16 Dec 2016, Igor Fedotov wrote:
> > > > Hey All!
> > > >
> > > > Recently I realized that I'm unable to fit all my onodes ( 32768
> > > > objects/4Mb each/4K alloc unit/no csum) in 15G RAM cache.
> > > >
> > > > Hence decided to estimate Onode in-memory size.
> > > >
> > > > At first I filled 4Mb object with a single 4M write - mempools
> > > > indicate ~5K mem usage for total onode metadata. Good enough.
> > > >
> > > > Secondly I refill that object with 4K writes. Resulting memusage =
> 574K!!!
> > > > Onode itself 704 bytes in 1 object.  And 4120 other metadata items
> > > > occupies all other space.
> > > >
> > > > Then I removed SharedBlob from mempools. Resulting mem usage =
> 355K.
> > > > The same Onode size. And 3096 other metadata objects. Hence we had
> > > > 1024 SharedBlob instances thata took ~ 220K.
> > > >
> > > >
> > > > And finally I removed Blob instances from measurements. Resulting
> > > > mem usage = 99K. And 2072 other objects. Hence Blob instances take
> > > > another ~250K
> > >
> > > Yikes!
> > >
> > > BTW you can get a full breakdown by type with 'mempool debug = true'
> > > in ceph.conf (-o 'mempool debug = true' on vstart.sh command line)
> > > without having to recompile.  Do you mind repeating the test and
> > > including the full breakdown?
> > >
> > > > Yeah, that's the worst case( actually csum enable will give even
> > > > more mem use) but shouldn't we revisit some Onode internals due to
> > > > such
> > > numbers?
> > >
> > > Yep!
> > >
> > > sage
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux