Handling is_readable=0 periods in mon

John Spray <jspray@xxxxxxxxxx> · Tue, 23 May 2017 18:29:23 +0100

Hi all,

I could use some help from people who understand the mon better than I
do with this ticket: http://tracker.ceph.com/issues/19706

The issue is that MDSMonitor is incorrectly killing MDSs because it
hasn't seen beacon messages, but the beacon messages are actually just
held up because is_readable = 0, like this:
2017-05-23 13:34:20.054785 7f772f1c2700 10
mon.b@0(leader).paxosservice(mdsmap 1..11) dispatch 0x7f7742989740
mdsbeacon(4141/a up:active seq 96 v9) v7 from mds.0
172.21.15.77:6809/2700711429 con 0x7f77428d8f00
2017-05-23 13:34:20.054788 7f772f1c2700  5 mon.b@0(leader).paxos(paxos
recovering c 1..293) is_readable = 0 - now=2017-05-23 13:34:20.054789
lease_expire=0.000000 has v0 lc 293
2017-05-23 13:34:20.054791 7f772f1c2700 10
mon.b@0(leader).paxosservice(mdsmap 1..11)  waiting for paxos ->
readable (v9)

This appears to be happening when one or more mons are a bit laggy,
but it is happening before an election has happened.

We have code for handling slow elections by checking how long it has
been since the last tick, and resetting our timeout information for
MDS beacons if it has been too long
(https://github.com/ceph/ceph/blob/master/src/mon/MDSMonitor.cc#L2070)

However, in this case the tick() function is getting called
throughout, we're just not seeing the beacons because they're held up
waiting for readable.

I could hack around this by only doing timeouts if *any* daemon has
successfully got a beacon through in the last (mds_beacon_grace*2) or
something like that, but I wonder if there's a Right way to handle
this for PaxosService subclasses?

Thanks,
John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html