> -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Christian Balzer > Sent: 17 February 2016 04:22 > To: ceph-users@xxxxxxxxxxxxxx > Cc: Piotr Wachowicz <piotr.wachowicz@xxxxxxxxxxxxxxxxxxx> > Subject: Re: SSDs for journals vs SSDs for a cache tier, which is > better? > > > Hello, > > On Tue, 16 Feb 2016 18:56:43 +0100 Piotr Wachowicz wrote: > > > Hey, > > > > Which one's "better": to use SSDs for storing journals, vs to use them > > as a writeback cache tier? All other things being equal. > > > Pears are better than either oranges or apples. ^_- > > > The usecase is a 15 osd-node cluster, with 6 HDDs and 1 SSDs per node. > > Used for block storage for a typical 20-hypervisor OpenStack cloud > > (with bunch of VMs running Linux). 10GigE public net + 10 GigE > > replication network. > > > > Let's consider both cases: > > Journals on SSDs - for writes, the write operation returns right after > > data lands on the Journal's SSDs, but before it's written to the > > backing HDD. So, for writes, SSD journal approach should be comparable > > to having a SSD cache tier. > Not quite, see below. > > > In both cases we're writing to an SSD (and to replica's SSDs), and > > returning to the client immediately after that. > > Data is only flushed to HDD later on. > > > Correct, note that the flushing is happening by the OSD process submitting > this write to the underlying device/FS. > It doesn't go from the journal to the OSD storage device, which has the > implication that with default settings and plain HDDs you quickly wind up > being being limited to what your actual HDDs can handle in a sustained > manner. > > > > > However for reads (of hot data) I would expect a SSD Cache Tier to be > > faster/better. That's because, in the case of having journals on SSDs, > > even if data is in the journal, it's always read from the (slow) > > backing disk anyway, right? But with a SSD cache tier, if the data is > > hot, it would be read from the (fast) SSD. > > > It will be read from the even faster pagecache if it is a sufficiently hot object > and you have sufficient RAM. > > > I'm sure both approaches have their own merits, and might be better > > for some specific tasks, but with all other things being equal, I > > would expect that using SSDs as the "Writeback" cache tier should, on > > average, provide better performance than suing the same SSDs for > Journals. > > Specifically in the area of read throughput/latency. > > > Cache tiers (currently) work only well if all your hot data fits into them. > In which case you'd even better off with with a dedicated SSD pool for that > data. > > Because (currently) Ceph has to promote a full object (4MB by default) to > the cache for each operation, be it read or or write. > That means the first time you want to read a 2KB file in your RBD backed VM, > Ceph has to copy 4MB from the HDD pool to the SSD cache tier. > This has of course a significant impact on read performance, in my crappy test > cluster reading cold data is half as fast as using the actual non-cached HDD > pool. > Just a FYI, there will most likely be several fixes/improvements going into Jewel which will address most of these problems with caching. Objects will now only be promoted if they are hit several times(configurable) and, if it makes it in time, a promotion throttle to stop too many promotions hindering cluster performance. However in the context of this thread, Christian is correct, SSD journals first and then caching if needed. > And once your cache pool has to evict objects because it is getting full, it has > to write out 4MB for each such object to the HDD pool. > Then read it back in later, etc. > > > The main difference, I suspect, between the two approaches is that in > > the case of multiple HDDs (multiple ceph-osd processes), all of those > > processes share access to the same shared SSD storing their journals. > > Whereas it's likely not the case with Cache tiering, right? Though I > > must say I failed to find any detailed info on this. Any clarification > > will be appreciated. > > > In your specific case writes to the OSDs (HDDs) will be (at least) 50% slower if > your journals are on disk instead of the SSD. > (Which SSDs do you plan to use anyway?) > I don't think you'll be happy with the resulting performance. > > Christian. > > > So, is the above correct, or am I missing some pieces here? Any other > > major differences between the two approaches? > > > > Thanks. > > P. > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com