I reported last week that I was getting permission denied when pcs was starting a gfs2 resource. I thought it was due to the resource being defined incorrectly, but it doesn¹t appear to be the case. On rare occasions the mount works but most of the time one node gets it mounted but the other gets denied. I¹ve enabled a number of logging options and done straces on both sides but I¹m not getting anywhere. My cluster looks like: # pcs resource show Clone Set: dlm-clone [dlm] Started: [ rh7cn1.devlab.sinenomine.net rh7cn2.devlab.sinenomine.net ] Resource Group: apachegroup VirtualIP (ocf::heartbeat:IPaddr2): Started Website (ocf::heartbeat:apache): Started httplvm (ocf::heartbeat:LVM): Started http_fs (ocf::heartbeat:Filesystem): Started Clone Set: clvmd-clone [clvmd] Started: [ rh7cn1.devlab.sinenomine.net rh7cn2.devlab.sinenomine.net ] Clone Set: clusterfs-clone [clusterfs] Started: [ rh7cn1.devlab.sinenomine.net ] Stopped: [ rh7cn2.devlab.sinenomine.net ] The gfs2 resource is defined: # pcs resource show clusterfs Resource: clusterfs (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/vg_cluster/ha_lv directory=/mnt/gfs2-demo fstype=gfs2 options=noatime Operations: start interval=0s timeout=60 (clusterfs-start-timeout-60) stop interval=0s timeout=60 (clusterfs-stop-timeout-60) monitor interval=10s on-fail=fence (clusterfs-monitor-interval-10s) When the mount is attempted on node 2 the log contains: Oct 13 11:10:42 rh7cn2 kernel: GFS2: fsid=rh7cluster:vol1: Trying to join cluster "lock_dlm", "rh7cluster:vol1" Oct 13 11:10:42 rh7cn2 corosync[47978]: [QB ] ipc_setup.c:handle_new_connection:485 IPC credentials authenticated (47978-48271-30) Oct 13 11:10:42 rh7cn2 corosync[47978]: [QB ] ipc_shm.c:qb_ipcs_shm_connect:294 connecting to client [48271] Oct 13 11:10:42 rh7cn2 corosync[47978]: [QB ] ringbuffer.c:qb_rb_open_2:236 shm size:1048589; real_size:1052672; rb->word_size:263168 Oct 13 11:10:42 rh7cn2 corosync[47978]: message repeated 2 times: [[QB ] ringbuffer.c:qb_rb_open_2:236 shm size:1048589; real_size:1052672; rb->word_size:263168] Oct 13 11:10:42 rh7cn2 corosync[47978]: [MAIN ] ipc_glue.c:cs_ipcs_connection_created:272 connection created Oct 13 11:10:42 rh7cn2 corosync[47978]: [CPG ] cpg.c:cpg_lib_init_fn:1532 lib_init_fn: conn=0x2ab16a953a0, cpd=0x2ab16a95a64 Oct 13 11:10:42 rh7cn2 corosync[47978]: [CPG ] cpg.c:message_handler_req_exec_cpg_procjoin:1349 got procjoin message from cluster node 0x2 (r(0) ip(172.17.16.148) ) for pid 48271 Oct 13 11:10:43 rh7cn2 kernel: GFS2: fsid=rh7cluster:vol1: Joined cluster. Now mounting FS... Oct 13 11:10:43 rh7cn2 corosync[47978]: [CPG ] cpg.c:message_handler_req_lib_cpg_leave:1617 got leave reques t on 0x2ab16a953a0Oct 13 11:10:43 rh7cn2 corosync[47978]: [CPG ] cpg.c:message_handler_req_exec_cpg_procleave:1365 got proclea ve message from cluster node 0x2 (r(0) ip(172.17.16.148) ) for pid 48271 Oct 13 11:10:43 rh7cn2 corosync[47978]: [CPG ] cpg.c:message_handler_req_lib_cpg_finalize:1655 cpg finalize for conn=0x2ab16a953a0 Oct 13 11:10:43 rh7cn2 dlm_controld[48271]: 251492 cpg_dispatch error 9 Is the ³leave request² symptomatic or causal? If the latter, why is it generated? On other other side: Oct 13 11:10:41 rh7cn1 corosync[10423]: [QUORUM] vsf_quorum.c:message_handler_req_lib_quorum_getquorate:395 got quorate request on 0x2ab0e33c8b0 Oct 13 11:10:41 rh7cn1 corosync[10423]: [QUORUM] vsf_quorum.c:message_handler_req_lib_quorum_getquorate:395 got quorate request on 0x2ab0e33c8b0 Oct 13 11:10:42 rh7cn1 corosync[10423]: [CPG ] cpg.c:message_handler_req_exec_cpg_procjoin:1349 got procjoin message from cluster node 0x2 (r(0) ip(172.17.16.148) ) for pid 48271 Oct 13 11:10:43 rh7cn1 kernel: GFS2: fsid=rh7cluster:vol1.0: recover generation 6 doneOct 13 11:10:43 rh7cn1 corosync[10423]: [CPG ] cpg.c:message_handler_req_exec_cpg_procleave:1365 got proclea ve message from cluster node 0x2 (r(0) ip(172.17.16.148) ) for pid 48271Oct 13 11:10:43 rh7cn1 kernel: GFS2: fsid=rh7cluster:vol1.0: recover generation 7 done dlm_tool dump shows: 251469 dlm:ls:vol1 conf 2 1 0 memb 1 2 join 2 left 251469 vol1 add_change cg 6 joined nodeid 2 251469 vol1 add_change cg 6 counts member 2 joined 1 remove 0 failed 0 251469 vol1 stop_kernel cg 6 251469 write "0" to "/sys/kernel/dlm/vol1/control" 251469 vol1 check_ringid done cluster 43280 cpg 1:43280 251469 vol1 check_fencing done 251469 vol1 send_start 1:6 counts 5 2 1 0 0 251469 vol1 receive_start 1:6 len 80 251469 vol1 match_change 1:6 matches cg 6 251469 vol1 wait_messages cg 6 need 1 of 2 251469 vol1 receive_start 2:1 len 80 251469 vol1 match_change 2:1 matches cg 6 251469 vol1 wait_messages cg 6 got all 2 251469 vol1 start_kernel cg 6 member_count 2 251469 dir_member 1 251469 set_members mkdir "/sys/kernel/config/dlm/cluster/spaces/vol1/nodes/2" 251469 write "1" to "/sys/kernel/dlm/vol1/control" 251469 vol1 prepare_plocks 251469 vol1 set_plock_data_node from 1 to 1 251469 vol1 send_all_plocks_data 1:6 251469 vol1 send_all_plocks_data 1:6 0 done 251469 vol1 send_plocks_done 1:6 counts 5 2 1 0 0 plocks_data 0 251469 vol1 receive_plocks_done 1:6 flags 2 plocks_data 0 need 0 save 0 251470 dlm:ls:vol1 conf 1 0 1 memb 1 join left 2 251470 vol1 add_change cg 7 remove nodeid 2 reason leave 251470 vol1 add_change cg 7 counts member 1 joined 0 remove 1 failed 0 251470 vol1 stop_kernel cg 7 251470 write "0" to "/sys/kernel/dlm/vol1/control" 251470 vol1 purged 0 plocks for 2 251470 vol1 check_ringid done cluster 43280 cpg 1:43280 251470 vol1 check_fencing done 251470 vol1 send_start 1:7 counts 6 1 0 1 0 251470 vol1 receive_start 1:7 len 76 251470 vol1 match_change 1:7 matches cg 7 251470 vol1 wait_messages cg 7 got all 1 251470 vol1 start_kernel cg 7 member_count 1 251470 dir_member 2 251470 dir_member 1 251470 set_members rmdir "/sys/kernel/config/dlm/cluster/spaces/vol1/nodes/2" 251470 write "1" to "/sys/kernel/dlm/vol1/control" 251470 vol1 prepare_plocks I would appreciate any debugging suggestions. I¹ve straced dlm_controld/corosync but not gained much clarity. Neale -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster