Hi,
we recently stumbled over a problem with the kernel based CephFS driver
(Ubuntu Trusty with 4.4.0-18 kernel from xenial lts backport package).
Our MDS failed for some unknown reason, and the standby MDS became active.
After rejoining the MDS cluster, the former standby MDS stuck at the
clientreplay state. Clients were not able to connect to it. We had to
fail back to the original MDS to recover clients:
[Wed Apr 27 11:17:48 2016] ceph: mds0 hung
[Wed Apr 27 11:36:30 2016] ceph: mds0 came back
[Wed Apr 27 11:36:30 2016] ceph: mds0 caps went stale, renewing
[Wed Apr 27 11:36:30 2016] ceph: mds0 caps stale
[Wed Apr 27 11:36:33 2016] libceph: mds0 192.168.6.132:6809 socket
closed (con state OPEN)
[Wed Apr 27 11:36:38 2016] libceph: mds0 192.168.6.132:6809 connection reset
[Wed Apr 27 11:36:38 2016] libceph: reset on mds0
[Wed Apr 27 11:36:38 2016] ceph: mds0 closed our session
[Wed Apr 27 11:36:38 2016] ceph: mds0 reconnect start
[Wed Apr 27 11:36:39 2016] ceph: mds0 reconnect denied
[Wed Apr 27 12:03:32 2016] libceph: mds0 192.168.6.132:6800 socket
closed (con state OPEN)
[Wed Apr 27 12:03:33 2016] libceph: mds0 192.168.6.132:6800 socket
closed (con state CONNECTING)
[Wed Apr 27 12:03:34 2016] libceph: mds0 192.168.6.132:6800 socket
closed (con state CONNECTING)
[Wed Apr 27 12:03:35 2016] libceph: mds0 192.168.6.132:6800 socket
closed (con state CONNECTING)
[Wed Apr 27 12:03:37 2016] libceph: mds0 192.168.6.132:6800 socket
closed (con state CONNECTING)
[Wed Apr 27 12:03:41 2016] libceph: mds0 192.168.6.132:6800 socket
closed (con state CONNECTING)
[Wed Apr 27 12:03:50 2016] ceph: mds0 reconnect start
[Wed Apr 27 12:03:50 2016] ceph: mds0 reconnect success
[Wed Apr 27 12:03:55 2016] ceph: mds0 recovery completed
(192.168.6.132 being the standby MDS)
The problem is similar to the one described in this mail thread from
september:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-September/004070.html
My questions are:
- Does a recent kernel include the fix to react to MDS map changes?
- If this is the case, which is the upstream kernel release including
the changes?
- Is it possible to manipulate the MDS map manually, e.g. by
/sys/kernel/debug/ceph/<client>/mdsmap ?
- Does using a second MDS in active/active setup provide a way to handle
this situation, although the configuration is not recommended (yet)?
Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com