Re: Best practice for osd_min_down_reporters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/07/2013 04:40 PM, Gregory Farnum wrote:
On Tuesday, May 7, 2013, Wido den Hollander wrote:

    Hi,

    I was just upgrading a 9 nodes, 36 OSD cluster running the next
    branch from some days ago to the Cuttlefish release.

    While rebooting the nodes one by one and waiting for a active+clean
    for all PGs I noticed that some weird things happened.

    I reboot a node and see:

    "osdmap e580: 36 osds: 4 up, 36 in"

    After a few seconds I see all the OSDs reporting:

    osd.33 [WRN] map e582 wrongly marked me down
    osd.5 [WRN] map e582 wrongly marked me down
    osd.6 [WRN] map e582 wrongly marked me down

    I didn't check what was happening here, but it seems like the 4 OSDs
    who were shutting down reported everybody but themselves out (Should
    have printed ceph osd tree).

    Thinking about that, there is the following configuration option:

    OPTION(osd_min_down_reporters, OPT_INT, 1)
    OPTION(osd_min_down_reports, OPT_INT, 3)

    So if just one OSD sends 3 reports it can mark anybody in the
    cluster down, right?

    Shouldn't the best practice be to set osd_min_down_reporters to at
    least numosdperhost+1

    In this case I have 4 OSDs per host, so shouldn't I use 5 here?

    This might as well be a bug, but it still doesn't seem right that
    all the OSDs on one machine can mark the whole cluster down.


I'm a little surprised tha OSDs turning off could have marked anybody
down at all. :/ Do you have any more info?


I was surprised as well. I'd have to dig a bit deeper to see what happened.

In any case, yeah, you probably want to increase your "reporters"
required. That value is set at 1 so it works on a 2-node cluster. :)

Does it seem sane to at least have this value greater than the amount of OSDs on one host? That way a single host can't mark the rest out when he gets into a weird situation.

-Greg


--
Software Engineer #42 @ http://inktank.com | http://ceph.com


--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux