Re: How To Properly Failover a HA Setup

Charles Tassell <charles@xxxxxxxxxxxxxx> · Sun, 27 Jan 2019 02:41:46 -0400

    I tried setting noout and that did
      provide a bit better result.  Basically I could stop the OSD on
      the inactive server and everything still worked (after a 2-3
      second pause) but then when I rebooted the inactive server
      everything hung again until it came back online and resynced with
      the cluster.  This is what I saw in ceph -s:

        cluster
      eb2003cf-b16d-4551-adb7-892469447f89

           health HEALTH_WARN

                  128 pgs degraded

                  124 pgs stuck unclean  

                  128 pgs undersized

                  recovery 805252/1610504 objects degraded (50.000%)

                  mds cluster is degraded

                  1/2 in osds are down   

                  noout flag(s) set

           monmap e1: 3 mons at
      {FILE1=10.1.1.201:6789/0,FILE2=10.1.1.202:6789/0,MON1=10.1.1.90:6789/0}

                  election epoch 216, quorum 0,1,2 FILE1,FILE2,MON1

            fsmap e796: 1/1/1 up {0=FILE2=up:rejoin}

           osdmap e360: 2 osds: 1 up, 2 in; 128 remapped pgs

                  flags noout,sortbitwise,require_jewel_osds

            pgmap v7056802: 128 pgs, 3 pools, 164 GB data, 786 kobjects

                  349 GB used, 550 GB / 899 GB avail

                  805252/1610504 objects degraded (50.000%)

                       128 active+undersized+degraded

        client io 1379 B/s rd, 1 op/s rd, 0 op/s wr

    These are the commands I ran and the
      results:
    ceph osd set noout
    systemctl stop ceph-mds@FILE2.service

      # Everything still works on the clients...

      systemctl stop ceph-osd@0.service
      # This was on FILE2 wile FILE1 was the active fsmap

      # Fails over quickly, can still read content on the clients..

      # Rebooted FILE2

      # File access on the clients locked up until FILE2 rejoined

    This is on Ubuntu 16 with kernel
      4.4.0-141, so I'm not sure if that qualifies for David's warning
      about old kernels...

    Is there a command or a logfile I can
      look at that will better help to diagnose this issue?  Is three
      servers (with only 2 OSDs) enough to run a HA cluster on ceph, or
      does it just die when it doesn't have 3 active servers for a
      quorum? Would installing MDS and MON on a 4th box (but sticking
      with 2 OSDs) be what's required to resolve this?  I really don't
      want to do that, but if I have to I guess I can look into find
      another box.

    On 2019-01-21 5:01 p.m., ceph-users-request@xxxxxxxxxxxxxx
      wrote:

    Message: 14
Date: Mon, 21 Jan 2019 10:05:15 +0100
From: Robert Sander <r.sander@xxxxxxxxxxxxxxxxxxx>
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  How To Properly Failover a HA Setup
Message-ID: <587dac75-96bc-8719-ee62-38e71491cd20@xxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

On 21.01.19 09:22, Charles Tassell wrote:

      Hello Everyone,

 ? I've got a 3 node Jewel cluster setup, and I think I'm missing 
something.? When I want to take one of my nodes down for maintenance 
(kernel upgrades or the like) all of my clients (running the kernel 
module for the cephfs filesystem) hang for a couple of minutes before 
the redundant servers kick in.

    Have you set the noout flag before doing cluster maintenance?

ceph osd set noout

and afterwards

ceph osd unset noout

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com