Thanks Bob, answers inline... On 10/13/14, 12:16 PM, "Bob Peterson" <rpeterso@xxxxxxxxxx> wrote: >----- Original Message ----- >> I would appreciate any debugging suggestions. I¹ve straced >> dlm_controld/corosync but not gained much clarity. >> >> Neale > >Hi Neale, > >1. What does it say if you try to mount the GFS2 file system manually > rather than from the configured service? Permissioned denied (I also used resource debug-start and that’s the message it gets as well). I disabled the resource and then tried mounting it as well and I was successful once but not a second time. As I mentioned, on rare occasions both sides do mount on cluster start up, which is worse than it never mounting! >2. After the failure, what does dmesg on all the nodes show? Node 1 - [256184.632116] dlm: vol1: dlm_recover 15 [256184.633300] dlm: vol1: add member 2 [256184.636944] dlm: vol1: dlm_recover_members 2 nodes [256184.664495] dlm: vol1: generation 8 slots 2 1:1 2:2 [256184.664531] dlm: vol1: dlm_recover_directory [256184.668865] dlm: vol1: dlm_recover_directory 0 in 0 new [256184.703328] dlm: vol1: dlm_recover_directory 10 out 1 messages [256184.784404] dlm: vol1: dlm_recover 15 generation 8 done: 120 ms [256184.785050] GFS2: fsid=rh7cluster:vol1.0: recover generation 8 done [256185.375091] dlm: vol1: dlm_recover 17 [256185.375655] dlm: vol1: dlm_clear_toss 1 done [256185.376263] dlm: vol1: remove member 2 [256185.376339] dlm: vol1: dlm_recover_members 1 nodes [256185.376403] dlm: vol1: generation 9 slots 1 1:1 [256185.376430] dlm: vol1: dlm_recover_directory [256185.376458] dlm: vol1: dlm_recover_directory 0 in 0 new [256185.376490] dlm: vol1: dlm_recover_directory 0 out 0 messages [256185.376638] dlm: vol1: dlm_recover_purge 6 locks for 1 nodes [256185.376664] dlm: vol1: dlm_recover_masters [256185.376714] dlm: vol1: dlm_recover_masters 0 of 26 [256185.376746] dlm: vol1: dlm_recover_locks 0 out [256185.376778] dlm: vol1: dlm_recover_locks 0 in [256185.376831] dlm: vol1: dlm_recover_rsbs 26 done [256185.377444] dlm: vol1: dlm_recover 17 generation 9 done: 0 ms [256185.377833] GFS2: fsid=rh7cluster:vol1.0: recover generation 9 done Node 2 (failing) - [256206.973005] GFS2: fsid=rh7cluster:vol1: Trying to join cluster "lock_dlm", "rh7cluster:vol1" [256206.973105] GFS2: fsid=rh7cluster:vol1: In gdlm_mount [256207.019743] dlm: vol1: joining the lockspace group... [256207.169061] dlm: vol1: group event done 0 0 [256207.169135] dlm: vol1: dlm_recover 1 [256207.170735] dlm: vol1: add member 2 [256207.170822] dlm: vol1: add member 1 [256207.174493] dlm: vol1: dlm_recover_members 2 nodes [256207.174798] dlm: vol1: join complete [256207.205167] dlm: vol1: dlm_recover_directory [256207.208924] dlm: vol1: dlm_recover_directory 10 in 10 new [256207.245335] dlm: vol1: dlm_recover_directory 0 out 1 messages [256207.329101] dlm: vol1: dlm_recover 1 generation 8 done: 120 ms [256207.851390] GFS2: fsid=rh7cluster:vol1: Joined cluster. Now mounting FS... [256207.881216] dlm: vol1: leaving the lockspace group... [256207.947479] dlm: vol1: group event done 0 0 [256207.949530] dlm: vol1: release_lockspace final free >3. What kernel is this? > >I would: >(1) Check to make sure the file system has enough journals for all nodes. > You can do gfs2_edit -p journals <device>. If your version of >gfs2-utils > doesn't have that option, you can alternately do: gfs2_edit -p jindex ><device> > and see how many journals are in the index. 3/3 [fc7745eb] 4/21 (0x4/0x15): File journal0 4/4 [8b70757d] 5/4127 (0x5/0x101f): File journal1 It was made via: mkfs.gfs2 -j 2 -J 16 -r 32 -t rh7cluster:vol1 /dev/mapper/vg_cluster-ha_lv >(2) Check to make sure the locking protocol is lock_dlm in the file system > superblock. You can get that from gfs2_edit -p sb <device> sb_lockproto lock_dlm >(3) Check to make sure the cluster name in the file system superblock > matches the configured cluster name. That's also in the superblock sb_locktable rh7cluster:vol1 Strangely, while /etc/corosync/corosync.conf has the cluster name specified, pcs status reports it as blank: # pcs status Cluster name: Last updated: Mon Oct 13 12:40:47 2014 >
Attachment:
default.xml
Description: default.xml
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster