Hello, On Tue, 16 Feb 2016 18:56:43 +0100 Piotr Wachowicz wrote: > Hey, > > Which one's "better": to use SSDs for storing journals, vs to use them > as a writeback cache tier? All other things being equal. > Pears are better than either oranges or apples. ^_- > The usecase is a 15 osd-node cluster, with 6 HDDs and 1 SSDs per node. > Used for block storage for a typical 20-hypervisor OpenStack cloud (with > bunch of VMs running Linux). 10GigE public net + 10 GigE replication > network. > > Let's consider both cases: > Journals on SSDs - for writes, the write operation returns right after > data lands on the Journal's SSDs, but before it's written to the backing > HDD. So, for writes, SSD journal approach should be comparable to having > a SSD cache tier. Not quite, see below. > In both cases we're writing to an SSD (and to > replica's SSDs), and returning to the client immediately after that. > Data is only flushed to HDD later on. > Correct, note that the flushing is happening by the OSD process submitting this write to the underlying device/FS. It doesn't go from the journal to the OSD storage device, which has the implication that with default settings and plain HDDs you quickly wind up being being limited to what your actual HDDs can handle in a sustained manner. > > However for reads (of hot data) I would expect a SSD Cache Tier to be > faster/better. That's because, in the case of having journals on SSDs, > even if data is in the journal, it's always read from the (slow) backing > disk anyway, right? But with a SSD cache tier, if the data is hot, it > would be read from the (fast) SSD. > It will be read from the even faster pagecache if it is a sufficiently hot object and you have sufficient RAM. > I'm sure both approaches have their own merits, and might be better for > some specific tasks, but with all other things being equal, I would > expect that using SSDs as the "Writeback" cache tier should, on average, > provide better performance than suing the same SSDs for Journals. > Specifically in the area of read throughput/latency. > Cache tiers (currently) work only well if all your hot data fits into them. In which case you'd even better off with with a dedicated SSD pool for that data. Because (currently) Ceph has to promote a full object (4MB by default) to the cache for each operation, be it read or or write. That means the first time you want to read a 2KB file in your RBD backed VM, Ceph has to copy 4MB from the HDD pool to the SSD cache tier. This has of course a significant impact on read performance, in my crappy test cluster reading cold data is half as fast as using the actual non-cached HDD pool. And once your cache pool has to evict objects because it is getting full, it has to write out 4MB for each such object to the HDD pool. Then read it back in later, etc. > The main difference, I suspect, between the two approaches is that in the > case of multiple HDDs (multiple ceph-osd processes), all of those > processes share access to the same shared SSD storing their journals. > Whereas it's likely not the case with Cache tiering, right? Though I > must say I failed to find any detailed info on this. Any clarification > will be appreciated. > In your specific case writes to the OSDs (HDDs) will be (at least) 50% slower if your journals are on disk instead of the SSD. (Which SSDs do you plan to use anyway?) I don't think you'll be happy with the resulting performance. Christian. > So, is the above correct, or am I missing some pieces here? Any other > major differences between the two approaches? > > Thanks. > P. -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com