"target replication level of 3" " with a min of 1 across the node level" After reading http://ceph.com/docs/master/rados/configuration/ceph-conf/ , I assume that to accomplish that then set these in ceph.conf ? osd pool default size = 3 osd pool default min size = 1 On Mon, Jul 28, 2014 at 2:56 PM, Michael <michael at onlinefusion.co.uk> wrote: > If you've two rooms then I'd go for two OSD nodes in each room, a target > replication level of 3 with a min of 1 across the node level, then have 5 > monitors and put the last monitor outside of either room (The other MON's > can share with the OSD nodes if needed). Then you've got 'safe' replication > for OSD/node replacement on failure with some 'shuffle' room for when it's > needed and either room can be down while the external last monitor allows > the decisions required to allow a single room to operate. > > There's no way you can do a 3/2 MON split that doesn't risk the two nodes > being up and unable to serve data while the three are down so you'd need to > find a way to make it a 2/2/1 split instead. > > -Michael > > > On 28/07/2014 18:41, Robert Fantini wrote: > > OK for higher availability then 5 nodes is better then 3 . So we'll > run 5 . However we want normal operations with just 2 nodes. Is that > possible? > > Eventually 2 nodes will be next building 10 feet away , with a brick wall > in between. Connected with Infiniband or better. So one room can go off > line the other will be on. The flip of the coin means the 3 node room > will probably go down. > All systems will have dual power supplies connected to different UPS'. > In addition we have a power generator. Later we'll have a 2-nd generator. > and then the UPS's will use different lines attached to those generators > somehow.. > Also of course we never count on one cluster to have our data. We have > 2 co-locations with backup going to often using zfs send receive and or > rsync . > > So for the 5 node cluster, how do we set it so 2 nodes up = OK ? Or > is that a bad idea? > > > PS: any other idea on how to increase availability are welcome . > > > > > > > > > On Mon, Jul 28, 2014 at 12:29 PM, Christian Balzer <chibi at gol.com> wrote: > >> On Mon, 28 Jul 2014 11:22:38 +0100 Joao Eduardo Luis wrote: >> >> > On 07/28/2014 08:49 AM, Christian Balzer wrote: >> > > >> > > Hello, >> > > >> > > On Sun, 27 Jul 2014 18:20:43 -0400 Robert Fantini wrote: >> > > >> > >> Hello Christian, >> > >> >> > >> Let me supply more info and answer some questions. >> > >> >> > >> * Our main concern is high availability, not speed. >> > >> Our storage requirements are not huge. >> > >> However we want good keyboard response 99.99% of the time. We >> > >> mostly do data entry and reporting. 20-25 users doing mostly >> > >> order , invoice processing and email. >> > >> >> > >> * DRBD has been very reliable , but I am the SPOF . Meaning that >> > >> when split brain occurs [ every 18-24 months ] it is me or no one who >> > >> knows what to do. Try to explain how to deal with split brain in >> > >> advance.... For the future ceph looks like it will be easier to >> > >> maintain. >> > >> >> > > The DRBD people would of course tell you to configure things in a way >> > > that a split brain can't happen. ^o^ >> > > >> > > Note that given the right circumstances (too many OSDs down, MONs >> down) >> > > Ceph can wind up in a similar state. >> > >> > >> > I am not sure what you mean by ceph winding up in a similar state. If >> > you mean regarding 'split brain' in the usual sense of the term, it does >> > not occur in Ceph. If it does, you have surely found a bug and you >> > should let us know with lots of CAPS. >> > >> > What you can incur though if you have too many monitors down is cluster >> > downtime. The monitors will ensure you need a strict majority of >> > monitors up in order to operate the cluster, and will not serve requests >> > if said majority is not in place. The monitors will only serve requests >> > when there's a formed 'quorum', and a quorum is only formed by (N/2)+1 >> > monitors, N being the total number of monitors in the cluster (via the >> > monitor map -- monmap). >> > >> > This said, if out of 3 monitors you have 2 monitors down, your cluster >> > will cease functioning (no admin commands, no writes or reads served). >> > As there is no configuration in which you can have two strict >> > majorities, thus no two partitions of the cluster are able to function >> > at the same time, you do not incur in split brain. >> > >> I wrote similar state, not "same state". >> >> From a user perspective it is purely semantics how and why your shared >> storage has seized up, the end result is the same. >> >> And yes, that MON example was exactly what I was aiming for, your cluster >> might still have all the data (another potential failure mode of cause), >> but is inaccessible. >> >> DRBD will see and call it a split brain, Ceph will call it a Paxos voting >> failure, it doesn't matter one iota to the poor sod relying on that >> particular storage. >> >> My point was and is, when you design a cluster of whatever flavor, make >> sure you understand how it can (and WILL) fail, how to prevent that from >> happening if at all possible and how to recover from it if not. >> >> Potentially (hopefully) in the case of Ceph it would be just to get a >> missing MON back up. >> But given that the failed MON might have a corrupted leveldb (it happened >> to me) will put Robert back into square one, as in, a highly qualified >> engineer has to deal with the issue. >> I.e somebody who can say "screw this dead MON, lets get a new one in" and >> is capable of doing so. >> >> Regards, >> >> Christian >> >> > If you are a creative admin however, you may be able to enforce split >> > brain by modifying monmaps. In the end you'd obviously end up with two >> > distinct monitor clusters, but if you so happened to not inform the >> > clients about this there's a fair chance that it would cause havoc with >> > unforeseen effects. Then again, this would be the operator's fault, not >> > Ceph itself -- especially because rewriting monitor maps is not trivial >> > enough for someone to mistakenly do something like this. >> > >> > -Joao >> > >> > >> >> >> -- >> Christian Balzer Network/Systems Engineer >> chibi at gol.com Global OnLine Japan/Fusion Communications >> http://www.gol.com/ >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > _______________________________________________ > ceph-users mailing listceph-users at lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140728/8426862a/attachment.htm>