On Fri, Nov 13, 2009 at 09:13:17AM -0600, Alan A wrote: > On Thu, Nov 12, 2009 at 5:49 PM, David Teigland <teigland@xxxxxxxxxx> wrote: > > > On Thu, Nov 12, 2009 at 04:22:17PM -0600, Alan A wrote: > > > Here are the packages that caused the lockup: > > > > > > [root@fenmrdev02 ~]# rpm -qa | grep sg3 > > > sg3_utils-libs-1.25-4.el5 > > > sg3_utils-1.25-4.el5 > > > sg3_utils-devel-1.25-4.el5 > > > > These packages are unrelated to the gfs_controld errors. > > > > > > Nov 12 15:28:20 fenmrdev04 ntpd[3340]: kernel time sync enabled 0001 > > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt > > open > > > > error 12 a11 > > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt > > open > > > > error 12 surv34 > > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt > > open > > > > error 12 account61 > > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt > > open > > > > error 12 acct63 > > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt > > open > > > > error 12 gfs_web > > > > Nov 12 15:28:26 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt > > open > > > > error 12 cati_gfs > > > > Nov 12 15:28:27 fenmrdev04 gfs_controld[2935]: retrieve_plocks: ckpt > > open > > > > error 12 gfs_cmdr > > > > These may or may not create problems. To figure out why they happened > > we'd need to see "group_tool dump gfs" from each of the nodes. > > > > Dave > > > > > Here is what I started with and where I am today. > > I had only one node out of three being able to mount GFS (clust has node > 2-3-4). The other nodes would tell me that /dev/mapper/gfsshare was not a > block device (node 2 and 4). I worked to see what changed and I found out > that November 5th update installed sg3_utils on two of the nodes that had > problem mounting GFS. I also found (I am not sure how this happened) that > one of the node 4 had service scsi_reserve running. As soon as I removed it, > a simple reboot allowed me to mount GFS on node4, but node 2 sill had the > same problem same errors. I tried looking if there is SCSI key reservation > active on one of the volumes, but no luck, no key was returned on any of the > GFS volumes. > > Today, something different..... > I am not sure what is going on but I can't mount GFS on all three nodes. I > was able to mount it on node2, but then I restarted node3 and everything > went to hell again. > > Here is the output from gfs_tool dump at the time when GFS was mounted: The retrieve_plocks errors are a harmless side effect of the failing mount syscalls, which are returning ENODEV. Are you using fence_scsi? I'm guessing not since you didn't have sg3_utils until now. As bizarre as it may sound, it seems that init.d/scsi_reserve may be applying scsi reservations on your devices, which you don't want of course, and which would explain the mount errors. I don't know how or why scsi_reserve is running, but you need to disable it (again assuming you're not using fence_scsi for your cluster.) Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster