Hi All, I've been testing an active/active Samba cluster over CephFS, performance seems really good with small files compared to Gluster. Soft reboots work beautifully with little to no interruption in file access. However when I perform a hard shutdown/reboot of one of the samba nodes, the remaining node detects that the other Samba node has disappeared but then eventually bans itself. If I leave everything for around 5 minutes, CTDB unbans itself and then everything continues running. >From what I can work out it looks like as the MDS has a stale session from the powered down node, it won't let the remaining node access the CTDB lock file (which is also sitting the on the CephFS). CTDB meanwhile is hammering away trying to access the lock file, but it sees what it thinks is a split brain scenario because something still has a lock on the lockfile, and so bans itself. I'm guessing the solution is to either reduce the mds session timeout or increase the amount of time/retries for CTDB, but I'm not sure what's the best approach. Does anyone have any ideas? Nick _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com