OK for higher availability then 5 nodes is better then 3 . So we'll run 5 . However we want normal operations with just 2 nodes. Is that possible? Eventually 2 nodes will be next building 10 feet away , with a brick wall in between. Connected with Infiniband or better. So one room can go off line the other will be on. The flip of the coin means the 3 node room will probably go down. All systems will have dual power supplies connected to different UPS'. In addition we have a power generator. Later we'll have a 2-nd generator. and then the UPS's will use different lines attached to those generators somehow.. Also of course we never count on one cluster to have our data. We have 2 co-locations with backup going to often using zfs send receive and or rsync . So for the 5 node cluster, how do we set it so 2 nodes up = OK ? Or is that a bad idea? PS: any other idea on how to increase availability are welcome . On Mon, Jul 28, 2014 at 12:29 PM, Christian Balzer <chibi at gol.com> wrote: > On Mon, 28 Jul 2014 11:22:38 +0100 Joao Eduardo Luis wrote: > > > On 07/28/2014 08:49 AM, Christian Balzer wrote: > > > > > > Hello, > > > > > > On Sun, 27 Jul 2014 18:20:43 -0400 Robert Fantini wrote: > > > > > >> Hello Christian, > > >> > > >> Let me supply more info and answer some questions. > > >> > > >> * Our main concern is high availability, not speed. > > >> Our storage requirements are not huge. > > >> However we want good keyboard response 99.99% of the time. We > > >> mostly do data entry and reporting. 20-25 users doing mostly > > >> order , invoice processing and email. > > >> > > >> * DRBD has been very reliable , but I am the SPOF . Meaning that > > >> when split brain occurs [ every 18-24 months ] it is me or no one who > > >> knows what to do. Try to explain how to deal with split brain in > > >> advance.... For the future ceph looks like it will be easier to > > >> maintain. > > >> > > > The DRBD people would of course tell you to configure things in a way > > > that a split brain can't happen. ^o^ > > > > > > Note that given the right circumstances (too many OSDs down, MONs down) > > > Ceph can wind up in a similar state. > > > > > > I am not sure what you mean by ceph winding up in a similar state. If > > you mean regarding 'split brain' in the usual sense of the term, it does > > not occur in Ceph. If it does, you have surely found a bug and you > > should let us know with lots of CAPS. > > > > What you can incur though if you have too many monitors down is cluster > > downtime. The monitors will ensure you need a strict majority of > > monitors up in order to operate the cluster, and will not serve requests > > if said majority is not in place. The monitors will only serve requests > > when there's a formed 'quorum', and a quorum is only formed by (N/2)+1 > > monitors, N being the total number of monitors in the cluster (via the > > monitor map -- monmap). > > > > This said, if out of 3 monitors you have 2 monitors down, your cluster > > will cease functioning (no admin commands, no writes or reads served). > > As there is no configuration in which you can have two strict > > majorities, thus no two partitions of the cluster are able to function > > at the same time, you do not incur in split brain. > > > I wrote similar state, not "same state". > > From a user perspective it is purely semantics how and why your shared > storage has seized up, the end result is the same. > > And yes, that MON example was exactly what I was aiming for, your cluster > might still have all the data (another potential failure mode of cause), > but is inaccessible. > > DRBD will see and call it a split brain, Ceph will call it a Paxos voting > failure, it doesn't matter one iota to the poor sod relying on that > particular storage. > > My point was and is, when you design a cluster of whatever flavor, make > sure you understand how it can (and WILL) fail, how to prevent that from > happening if at all possible and how to recover from it if not. > > Potentially (hopefully) in the case of Ceph it would be just to get a > missing MON back up. > But given that the failed MON might have a corrupted leveldb (it happened > to me) will put Robert back into square one, as in, a highly qualified > engineer has to deal with the issue. > I.e somebody who can say "screw this dead MON, lets get a new one in" and > is capable of doing so. > > Regards, > > Christian > > > If you are a creative admin however, you may be able to enforce split > > brain by modifying monmaps. In the end you'd obviously end up with two > > distinct monitor clusters, but if you so happened to not inform the > > clients about this there's a fair chance that it would cause havoc with > > unforeseen effects. Then again, this would be the operator's fault, not > > Ceph itself -- especially because rewriting monitor maps is not trivial > > enough for someone to mistakenly do something like this. > > > > -Joao > > > > > > > -- > Christian Balzer Network/Systems Engineer > chibi at gol.com Global OnLine Japan/Fusion Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140728/27632f18/attachment.htm>