Re: SSDs for journals vs SSDs for a cache tier, which is better?

Christian Balzer <chibi@xxxxxxx> · Wed, 17 Feb 2016 21:36:17 +0900

Hello,

On Wed, 17 Feb 2016 09:23:11 -0000 Nick Fisk wrote:

> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
> > Of Christian Balzer
> > Sent: 17 February 2016 04:22
> > To: ceph-users@xxxxxxxxxxxxxx
> > Cc: Piotr Wachowicz <piotr.wachowicz@xxxxxxxxxxxxxxxxxxx>
> > Subject: Re:  SSDs for journals vs SSDs for a cache tier,
> which is
> > better?
> > 
[snip]
> > > I'm sure both approaches have their own merits, and might be better
> > > for some specific tasks, but with all other things being equal, I
> > > would expect that using SSDs as the "Writeback" cache tier should, on
> > > average, provide better performance than suing the same SSDs for
> > Journals.
> > > Specifically in the area of read throughput/latency.
> > >
> > Cache tiers (currently) work only well if all your hot data fits into
> them.
> > In which case you'd even better off with with a dedicated SSD pool for
> that
> > data.
> > 
> > Because (currently) Ceph has to promote a full object (4MB by default)
> > to the cache for each operation, be it read or or write.
> > That means the first time you want to read a 2KB file in your RBD
> > backed
> VM,
> > Ceph has to copy 4MB from the HDD pool to the SSD cache tier.
> > This has of course a significant impact on read performance, in my
> > crappy
> test
> > cluster reading cold data is half as fast as using the actual
> > non-cached
> HDD
> > pool.
> > 
> 
> Just a FYI, there will most likely be several fixes/improvements going
> into Jewel which will address most of these problems with caching.
> Objects will now only be promoted if they are hit several
> times(configurable) and, if it makes it in time, a promotion throttle to
> stop too many promotions hindering cluster performance.
> 
Ah, both of these would be very nice indeed, especially since the first
one is something that's supposedly already present (but broken).

The 2nd one, if done right, will be probably a game changer.
Robert LeBlanc and me will be most pleased. 

> However in the context of this thread, Christian is correct, SSD journals
> first and then caching if needed.
>
Yeah, thus my overuse of "currently". ^o^

Christian 
> 
> > And once your cache pool has to evict objects because it is getting
> > full,
> it has
> > to write out 4MB for each such object to the HDD pool.
> > Then read it back in later, etc.
> > 
> > > The main difference, I suspect, between the two approaches is that in
> > > the case of multiple HDDs (multiple ceph-osd processes), all of those
> > > processes share access to the same shared SSD storing their journals.
> > > Whereas it's likely not the case with Cache tiering, right? Though I
> > > must say I failed to find any detailed info on this. Any
> > > clarification will be appreciated.
> > >
> > In your specific case writes to the OSDs (HDDs) will be (at least) 50%
> slower if
> > your journals are on disk instead of the SSD.
> > (Which SSDs do you plan to use anyway?)
> > I don't think you'll be happy with the resulting performance.
> > 
> > Christian.
> > 
> > > So, is the above correct, or am I missing some pieces here? Any other
> > > major differences between the two approaches?
> > >
> > > Thanks.
> > > P.
> > 
> > 
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com