On Mon, 9 May 2016, Nick Fisk wrote: > Hi All, > > I've been testing an active/active Samba cluster over CephFS, performance > seems really good with small files compared to Gluster. Soft reboots work > beautifully with little to no interruption in file access. However when I > perform a hard shutdown/reboot of one of the samba nodes, the remaining node > detects that the other Samba node has disappeared but then eventually bans > itself. If I leave everything for around 5 minutes, CTDB unbans itself and > then everything continues running. > > From what I can work out it looks like as the MDS has a stale session from > the powered down node, it won't let the remaining node access the CTDB lock > file (which is also sitting the on the CephFS). CTDB meanwhile is hammering > away trying to access the lock file, but it sees what it thinks is a split > brain scenario because something still has a lock on the lockfile, and so > bans itself. > > I'm guessing the solution is to either reduce the mds session timeout or > increase the amount of time/retries for CTDB, but I'm not sure what's the > best approach. Does anyone have any ideas? I believe Ira was looking at this exact issue, and addressed it by lowering the mds_session_timeout to 30 seconds? sage _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com