> Here is what was the setup : I thought I'd share an update in case it helps others. Your ideas inspired me to try a different approach. We support 4 main distros (and a 2 variants of some). We try not to provide our own versions of distro-supported packages like CTDB where possible. So a concern for me in modifying services is that they could be replaced in package updates. There are ways to mitigate that but that thought combined with yourr ideas led me to try this: - Be sure ctdb service is disabled - Added a systemd serivce of my own, oneshot, that runs a helper script - The helper script first ensures the gluster volumes show up (I use localhost in my case and besides, in our environment, we don't want CTDB to have a public IP anyway until NFS can be served so this helps there too) - Even with the gluster volume showing good, during init startup, first attempts to mount gluster volumes fail. So the helper script keeps looping until they work. It seems they work on the 2nd try (after a 3s sleep at failure). - Once the mounts are confirmed working and mounted, then my helper starts the ctdb service. - Awkward CTDB problems (where the lock check sometimes fails to detect a lock problem) are avoided since we won't start CTDB until we're 100% sure the gluster lock is mounted and pointing at gluster. The above is working in prototype form so I'm going to start adding my bind mounts to the equation. I think I have a solution that will work now and I thank you so much for the ideas. I'm taking things from prototype form now on to something we can provide people. With regards to pacemaker. There are a few pacemaker solutions that I've touched, and one I even helped implement. Now, it could be that I'm not an expert at writing rules, but pacemaker seems to have often given us more trouble than the problem it solves. I believe this is due to the complexity of the software and the power of it. I am not knocking pacemaker. However, a person really has to be a pacemaker expert to not make a mistake that could cause a down time. So I have attempted to avoid pacemaker in the new solution. I know there are down sides -- fencing is there for a reason -- but as far as I can tell the decision has been right for us. CTDB is less complicated even if does not provide 100% true full HA abilities. That said, in the solution, I've been careful to future-proof a move to pacemaker. For example, on the gluster servers/NFS servers, I bring up IP aliases (interfaces) on the network the BMCs reside so we're seamlessly able to switch to pacemaker with IPMI/BMC/redfish fencing later if needed without causing too much pain in the field with deployed servers. I do realize there are tools to help configure pacemaker for you. Some that I've tried have given me mixed results, perhaps due to the complexity of networking setup in the solutions we have. As we start to deploy this to more locations, I'll gain a feel for if a move to pacemaker is right or not. I just share this in the interest of learning. I'm always willing to learn and improve if I've overlooked something. Erik ________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/118564314 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/118564314 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users