CephFS + CTDB/Samba - MDS session timeout on lockfile

Nick Fisk <nick@xxxxxxxxxx> · Mon, 9 May 2016 16:31:00 +0100

Hi All,

I've been testing an active/active Samba cluster over CephFS, performance
seems really good with small files compared to Gluster. Soft reboots work
beautifully with little to no interruption in file access. However when I
perform a hard shutdown/reboot of one of the samba nodes, the remaining node
detects that the other Samba node has disappeared but then eventually bans
itself. If I leave everything for around 5 minutes, CTDB unbans itself and
then everything continues running.

>From what I can work out it looks like as the MDS has a stale session from
the powered down node, it won't let the remaining node access the CTDB lock
file (which is also sitting the on the CephFS). CTDB meanwhile is hammering
away trying to access the lock file, but it sees what it thinks is a split
brain scenario because something still has a lock on the lockfile, and so
bans itself.

I'm guessing the solution is to either reduce the mds session timeout or
increase the amount of time/retries for CTDB, but I'm not sure what's the
best approach. Does anyone have any ideas?

Nick

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com