On Wed, 2008-02-13 at 09:23 +0100, Ferenc Wagner wrote: > Thanks! This patch indeed fixed the hang. But of course not the > mount: > > Trying to join cluster "lock_dlm", "pilot:test" > Joined cluster. Now mounting FS... > GFS: fsid=pilot:test.4294967295: can't mount journal #4294967295 > GFS: fsid=pilot:test.4294967295: there are only 6 journals (0 - 5) Hi Ferenc, The "4294967295" is really a -1 which is a bad return code on the mount. So it should be a process of elimination to find out what went wrong. Several possibilities of what can be going wrong come to mind: 1. Is it possible that your file system has a different cluster name ("pilot") from the the cluster name in your cluster.conf file? 2. Perhaps there is another gfs file system with the same name "test" already mounted? 3. Perhaps it can't find the locking protocol, lock_dlm (I hope)? Make sure lock_dlm shows up in lsmod. 4. Perhaps gfs can't find the rest of the cluster infrastructure? Check to make sure you did "service cman start" and have aisexec running on the system having the problem. Also, check /var/log/messages for messages pertaining to cluster problems. It sounds to me like we should have a better error message for whatever went wrong. Let's figure that out first and then we can go about improving the error messages with a bugzilla if needed. We have improved the error messages considerably from earlier. I don't know what version of the gfs2-utils you have, but that will contain the common mount helper (/sbin/mount.gfs2 is a hard link to /sbin/mount.gfs) that does some of this error processing when mounts fail. So a newer version of the mount helper may be better at pointing out what it doesn't like about your file system. > # gfs_tool jindex /dev/mapper/gfs-test > gfs_tool: /dev/mapper/gfs-test is not a GFS file/filesystem > > Scary. What may be the problem? The other node is using this > volume... Can even unmount/remount it. Though in dmesg it says: I wouldn't call it scary at all. It sounds like gfs_tool may be somewhat confused about the mount point. Try using the mount point that was used on the mount command, not the /dev/mapper mount point and see if that helps. I've actually been working on making a better version of that code too--both kernel and userland--that improves how gfs_tool finds mount points. For RHEL5, they're bugzillas 431951 (gfs_tool) and 431952 (kernel) respectively. Those changes have not been shipped yet, due to code freeze, but patches are in the bugzilla records. As for all the kernel dmesgs you noted, that's perfectly normal. When you mount a gfs file system, it runs through all the journals regardless, checking if they are clean or need to be replayed, so that's all those kernel messages mean. They're not locked (well, they are, but only for a couple seconds). Regards, Bob Peterson Red Hat GFS -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster