Hello Nick, On Wed, 4 Mar 2015 08:49:22 -0000 Nick Fisk wrote: > Hi Christian, > > Yes that's correct, it's on the client side. I don't see this much > different to a battery backed Raid controller, if you lose power, the > data is in the cache until power resumes when it is flushed. > > If you are going to have the same RBD accessed by multiple > servers/clients then you need to make sure the SSD is accessible to both > (eg DRBD / Dual Port SAS). But then something like pacemaker would be > responsible for ensuring the RBD and cache device are both present > before allowing client access. > Which is pretty much any and all use cases I can think about. Because it's not only concurrent (active/active) accesses, but you really need to have things consistent across all possible client hosts in case of a node failure. I'm no stranger to DRBD and Pacemaker (which incidentally didn't make it into Debian Jessie, queue massive laughter and ridicule), btw. > When I wrote this I was thinking more about 2 HA iSCSI servers with > RBD's, however I can understand that this feature would prove more of a > challenge if you are using Qemu and RBD. > One of the reasons I'm using Ceph/RBD instead of DRBD (which is vastly more suited for some use cases) is that it allows me n+1 instead of n+n redundancy when it comes to consumers (compute nodes in my case). Now for your iSCSI head (looking forward to your results and any config recipes) that limitation to a pair may be just as well, but as others wrote it might be best to go forward with this outside of Ceph. Especially since you're already dealing with a HA cluster/pacemaker in that scenario. Christian > Nick > > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Christian Balzer > Sent: 04 March 2015 08:40 > To: ceph-users@xxxxxxxxxxxxxx > Cc: Nick Fisk > Subject: Re: Persistent Write Back Cache > > > Hello, > > If I understand you correctly, you're talking about the rbd cache on the > client side. > > So assume that host or the cache SSD on if fail terminally. > The client thinks its sync'ed are on the permanent storage (the actual > ceph storage cluster), while they are only present locally. > > So restarting that service or VM on a different host now has to deal with > likely crippling data corruption. > > Regards, > > Christian > > On Wed, 4 Mar 2015 08:26:52 -0000 Nick Fisk wrote: > > > Hi All, > > > > > > > > Is there anything in the pipeline to add the ability to write the > > librbd cache to ssd so that it can safely ignore sync requests? I have > > seen a thread a few years back where Sage was discussing something > > similar, but I can't find anything more recent discussing it. > > > > > > > > I've been running lots of tests on our new cluster, buffered/parallel > > performance is amazing (40K Read 10K write iops), very impressed. > > However sync writes are actually quite disappointing. > > > > > > > > Running fio with 128k block size and depth=1, normally only gives me > > about 300iops or 30MB/s. I'm seeing 2-3ms latency writing to SSD OSD's > > and from what I hear that's about normal, so I don't think I have a > > ceph config problem. For applications which do a lot of sync's, like > > ESXi over iSCSI or SQL databases, this has a major performance impact. > > > > > > > > Traditional storage arrays work around this problem by having a > > battery backed cache which has latency 10-100 times less than what you > > can currently achieve with Ceph and an SSD . Whilst librbd does have a > > writeback cache, from what I understand it will not cache syncs and so > > in my usage case, it effectively acts like a write through cache. > > > > > > > > To illustrate the difference a proper write back cache can make, I put > > a 1GB (512mb dirty threshold) flashcache in front of my RBD and > > tweaked the flush parameters to flush dirty blocks at a large queue > > depth. The same fio test (128k iodepth=1) now runs at 120MB/s and is > > limited by the performance of SSD used by flashcache, as everything is > > stored as 4k blocks on the ssd. In fact since everything is stored as > > 4k blocks, pretty much all IO sizes are accelerated to max speed of the > SSD. > > Looking at iostat I can see all the IO's are getting coalesced into > > nice large 512kb IO's at a high queue depth, which Ceph easily > > swallows. > > > > > > > > If librbd could support writing its cache out to SSD it would > > hopefully achieve the same level of performance and having it > > integrated would be really neat. > > > > > > > > Nick > > > > > > > > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com