Enforce MDS map update in CephFS kernel driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

we recently stumbled over a problem with the kernel based CephFS driver (Ubuntu Trusty with 4.4.0-18 kernel from xenial lts backport package). Our MDS failed for some unknown reason, and the standby MDS became active.

After rejoining the MDS cluster, the former standby MDS stuck at the clientreplay state. Clients were not able to connect to it. We had to fail back to the original MDS to recover clients:

[Wed Apr 27 11:17:48 2016] ceph: mds0 hung
[Wed Apr 27 11:36:30 2016] ceph: mds0 came back
[Wed Apr 27 11:36:30 2016] ceph: mds0 caps went stale, renewing
[Wed Apr 27 11:36:30 2016] ceph: mds0 caps stale
[Wed Apr 27 11:36:33 2016] libceph: mds0 192.168.6.132:6809 socket closed (con state OPEN)
[Wed Apr 27 11:36:38 2016] libceph: mds0 192.168.6.132:6809 connection reset
[Wed Apr 27 11:36:38 2016] libceph: reset on mds0
[Wed Apr 27 11:36:38 2016] ceph: mds0 closed our session
[Wed Apr 27 11:36:38 2016] ceph: mds0 reconnect start
[Wed Apr 27 11:36:39 2016] ceph: mds0 reconnect denied
[Wed Apr 27 12:03:32 2016] libceph: mds0 192.168.6.132:6800 socket closed (con state OPEN) [Wed Apr 27 12:03:33 2016] libceph: mds0 192.168.6.132:6800 socket closed (con state CONNECTING) [Wed Apr 27 12:03:34 2016] libceph: mds0 192.168.6.132:6800 socket closed (con state CONNECTING) [Wed Apr 27 12:03:35 2016] libceph: mds0 192.168.6.132:6800 socket closed (con state CONNECTING) [Wed Apr 27 12:03:37 2016] libceph: mds0 192.168.6.132:6800 socket closed (con state CONNECTING) [Wed Apr 27 12:03:41 2016] libceph: mds0 192.168.6.132:6800 socket closed (con state CONNECTING)
[Wed Apr 27 12:03:50 2016] ceph: mds0 reconnect start
[Wed Apr 27 12:03:50 2016] ceph: mds0 reconnect success
[Wed Apr 27 12:03:55 2016] ceph: mds0 recovery completed

(192.168.6.132 being the standby MDS)

The problem is similar to the one described in this mail thread from september:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-September/004070.html

My questions are:

- Does a recent kernel include the fix to react to MDS map changes?
- If this is the case, which is the upstream kernel release including the changes? - Is it possible to manipulate the MDS map manually, e.g. by /sys/kernel/debug/ceph/<client>/mdsmap ? - Does using a second MDS in active/active setup provide a way to handle this situation, although the configuration is not recommended (yet)?

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux