On Wed, May 9, 2018 at 6:26 AM, Andrew Price <anprice@xxxxxxxxxx> wrote: > [linux-cluster@ isn't really used nowadays; CCing users@clusterlabs] > > On 08/05/18 12:18, Jason Gauthier wrote: >> >> Greetings, >> >> I'm working on a setup of a two-node cluster with shared storage. >> I've been able to see the storage on both nodes, and appropriate >> configuration for fencing the bock device. >> >> The next step was getting DLM and GFS2 in a clone group to mount the >> FS on both drives. This is where I am running into trouble. >> >> As far as the OS goes, it's debian. I'm using pacemaker, corosync, >> and crm for cluster management. > > > Is it safe to assume that you're using Debian Wheezy? (The need for > gfs_controld disappeared in the 3.3 kernel.) As wheezy goes end-of-life at > the end of the month I would suggest upgrading, you will likely find the > cluster tools more user friendly and the components more stable. I am using stretch, which was the challenge at first. I couldn't find any information about it. Even as new as Jessie contains gfs2_controld. I could not figure out how to make it work. But, yeah, that is now removed.. because it works fine without it. And the good news is: I messed around with this for quite some time last night and finally got everything to come up reliably on both nodes. Even reboots,and simultaneous reboots. So, I am pleased! Time for the next part which is building some VMs. Thanks for the help! >> At the moment, I've removed the gfs2 parts just to try and get dlm >> working. >> >> My current config looks like this: >> >> node 1084772368: alpha >> node 1084772369: beta >> primitive p_dlm_controld ocf:pacemaker:controld \ >> op monitor interval=60 timeout=60 \ >> meta target-role=Started args=-K >> primitive p_gfs_controld ocf:pacemaker:controld \ >> params daemon=gfs_controld \ >> meta target-role=Started >> primitive stonith_sbd stonith:external/sbd \ >> params pcmk_delay_max=30 sbd_device="/dev/sdb1" >> group g_gfs2 p_dlm_controld p_gfs_controld >> clone cl_gfs2 g_gfs2 \ >> meta interleave=true target-role=Started >> property cib-bootstrap-options: \ >> have-watchdog=false \ >> dc-version=1.1.16-94ff4df \ >> cluster-infrastructure=corosync \ >> cluster-name=zeta \ >> last-lrm-refresh=1525523370 \ >> stonith-enabled=true \ >> stonith-timeout=20s >> >> When a bring the resources up, I get a quick blip in my logs. >> May 8 07:13:58 beta dlm_controld[9425]: 253556 dlm_controld 4.0.7 started >> May 8 07:14:00 beta kernel: [253558.641658] dlm: closing connection >> to node 1084772369 >> May 8 07:14:00 beta kernel: [253558.641764] dlm: closing connection >> to node 1084772368 >> >> >> This is the same messaging I see when I run dlm manually and then stop >> it. My challenge here is that I cannot find out what dlm is doing. >> I've tried adding -K to /etc/default/dlm, but I don't think that file >> is being respected. I would like to figure out how to increase the >> verbose output of dlm_controld so I can see why it won't stay running >> when it's launched through the cluster. I haven't been able to >> figure out how to pass arguments directly to the a daemon in the >> primitive config, if it's even possible. Otherwise, I would try to >> pass -K there. >> >> Thanks! >> >> Jason >> > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster