On Mon, Aug 11, 2014 at 11:26 PM, John Morris <john at zultron.com> wrote: > On 08/11/2014 08:26 PM, Craig Lewis wrote: > >> Your MON nodes are separate hardware from the OSD nodes, right? >> > > Two nodes are OSD + MON, plus a separate MON node. > > > If so, >> with replication=2, you should be able to shut down one of the two OSD >> nodes, and everything will continue working. >> > > IIUC, the third MON node is sufficient for a quorum if one of the OSD + > MON nodes shuts down, is that right? > So yeah, if you lose any one node, you'll be fine. > > Replication=2 is a little worrisome, since we've already seen two disks > simultaneously fail just in the year the cluster has been running. That > statistically unlikely situation is the first and probably last time I'll > see that, but they say lightning can strike twice.... That's a low probability, given the number of disks you have. I would've taken that bet (with backups). As the number of OSDs goes up, the probability of multiple simultaneous failures goes up, and slowly becomes a bad bet. > > > Since it's for >> experimentation, I wouldn't deal with the extra hassle of replication=4 >> and custom CRUSH rules to make it work. If you have your heart set on >> that, it should be possible. I'm no CRUSH expert though, so I can't say >> for certain until I've actually done it. >> >> I'm a bit confused why your performance is horrible though. I'm >> assuming your HDDs are 7200 RPM. With the SSD journals and >> replication=3, you won't have a ton of IO, but you shouldn't have any >> problem doing > 100 MB/s with 4 MB blocks. Unless your SSDs are very >> low quality, the HDDs should be your bottleneck. >> > > The below setup is tomorrow's plan; today's reality is 3 OSDs on one node > and 2 OSDs on another, crappy SSDs, 1Gb networks, pgs stuck unclean and no > monitoring to pinpoint bottlenecks. My work is cut out for me. :) > > Thanks for the helpful reply. I wish we could just add a third OSD node > and have these issues just go away, but it's not in the budget ATM. > > Ah, yeah, that explains the performance problems. Although, crappy SSD journals are still better than no SSD journals. When I added SSD journals to my existing cluster, I saw my write bandwidth go from 10 MBps/disk to 50MBps/disk. Average latency dropped a bit, and the variance in latency dropped a lot. Just adding more disks to your existing nodes would help performance, assuming you have room to add them. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140812/03bb6855/attachment.htm>