Re: cephfs-snapshots causing mds failover, hangs

"Yan, Zheng" <ukernel@xxxxxxxxx> · Mon, 26 Aug 2019 20:55:03 +0800



On Mon, Aug 26, 2019 at 6:57 PM thoralf schulze <t.schulze@xxxxxxxxxxxx> wrote:
>
> hi Zheng,
>
> On 8/21/19 4:32 AM, Yan, Zheng wrote:
> > Please enable debug mds (debug_mds=10), and try reproducing it again.
>
> please find the logs at
> https://www.user.tu-berlin.de/thoralf.schulze/ceph-debug.tar.xz .
>
> we managed to reproduce the issue as a worst case scenario: before
> snapshotting, juju-d0f708-5-lxd-1 and juju-d0f708-10-lxd-1 were the
> active mds's and juju-d0f708-3-lxd-1 and juju-d0f708-9-lxd-1 standbys.
> we created the snapshot at ~08:11:50, a little later the failover
> happened and juju-d0f708-5-lxd-1 and juju-d0f708-10-lxd-1 went mia. a
> little later still, the now-active juju-d0f708-3-lxd-1 and
> juju-d0f708-9-lxd-1 mds's dropped out of the cluster as well. we started
> to restart all mds daemons at ~08:16.
>
> thank you very much & with kind regards,
> t.
>

I tracked down the bug
https://tracker.ceph.com/issues/41434

Thanks
Yan, Zheng
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com