Re: mds standby + standby-reply upgrade

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 30 Jun 2016 13:28:38 -0700



On Thu, Jun 30, 2016 at 1:03 PM, Dzianis Kahanovich <mahatma@xxxxxxx> wrote:
> Upgraded infernalis->jewel (git, Gentoo). Upgrade passed over global
> stop/restart everything oneshot.
>
> Infernalis: e5165: 1/1/1 up {0=c=up:active}, 1 up:standby-replay, 1 up:standby
>
> Now after upgrade start and next mon restart, active monitor falls with
> "assert(info.state == MDSMap::STATE_STANDBY)" (even without running mds) . Fixed:
>
> --- a/src/mon/MDSMonitor.cc     2016-06-27 21:26:26.000000000 +0300
> +++ b/src/mon/MDSMonitor.cc     2016-06-28 10:44:32.000000000 +0300
> @@ -2793,7 +2793,11 @@ bool MDSMonitor::maybe_promote_standby(s
>      for (const auto &j : pending_fsmap.standby_daemons) {
>        const auto &gid = j.first;
>        const auto &info = j.second;
> -      assert(info.state == MDSMap::STATE_STANDBY);
> +//      assert(info.state == MDSMap::STATE_STANDBY);
> +      if (info.state != MDSMap::STATE_STANDBY) {
> +        dout(0) << "gid " << gid << " ex-assert(info.state ==
> MDSMap::STATE_STANDBY) " << do_propose << dendl;
> +       return do_propose;
> +      }
>
>        if (!info.standby_replay) {
>          continue;
>
>
> Now: e5442: 1/1/1 up {0=a=up:active}, 1 up:standby
> - but really there are 3 mds (active, replay, standby).
>
> # ceph mds dump
> dumped fsmap epoch 5442
> fs_name cephfs
> epoch   5441
> flags   0
> created 2016-04-10 23:44:38.858769
> modified        2016-06-27 23:08:26.211880
> tableserver     0
> root    0
> session_timeout 60
> session_autoclose       300
> max_file_size   1099511627776
> last_failure    5239
> last_failure_osd_epoch  18473
> compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses
> versioned encoding,6=dirfrag is stored in omap,8=no anchor table}
> max_mds 1
> in      0
> up      {0=3104110}
> failed
> damaged
> stopped
> data_pools      5
> metadata_pool   6
> inline_data     disabled
> 3104110:        10.227.227.103:6800/14627 'a' mds.0.5436 up:active seq 30
> 3084126:        10.227.227.104:6800/24069 'c' mds.0.0 up:standby-replay seq 1
>
>
> If standby-replay false - all OK: 1/1/1 up {0=a=up:active}, 2 up:standby
>
> How to fix this 3-mds behaviour?

Ah, you hit a known bug with that assert. I thought the fix was
already in the latest point release; are you behind?
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com