Dear all,
I am running GFS 6.1 with dlm on a cluster (4 nodes + front-end) of
dual-headed Opterons and RHEL4U3. Because of some problems (kernel
panic...) I had to hard boot some nodes of the cluster. Now, some gfs
partitions simply won't mount. In some nodes, they will simply keep
waiting forever for the join of the GFS group:
So three questions:
- What is the join exactly waiting for ? Cluster status is fine,
everybody is member ...
- What does the status code mean in the cman_tool output ?
- What can I do to restart this cluster ?
NB: Before testing this (below) I rebooted the complete cluster and
gfs_fsck'ed /all nodes /with everything unmounted.
----------------------------------------------------------------------------------------------------
root # service clvmd start
root #: service gfs start
Mounting GFS filesystems: # forever !
in another console I get:
root # dmesg | tail
...
GFS: fsid=globcover:baieGC2b.0: jid=14: Done
GFS: fsid=globcover:baieGC2b.0: jid=15: Trying to acquire journal lock...
GFS: fsid=globcover:baieGC2b.0: jid=15: Looking at journal...
GFS: fsid=globcover:baieGC2b.0: jid=15: Done
GFS: Trying to join cluster "lock_dlm", "globcover:baieGC3a"
root # cman_tool services
Service Name GID LID State Code
Fence Domain: "default" 11 2 run -
[1 5 4 3 2]
DLM Lock Space: "clvmd" 12 3 run -
[1 5 4 3 2]
DLM Lock Space: "baieGC2b" 13 4 run -
[1 5]
DLM Lock Space: "baieGC3a" 15 6 run -
[1 5 2 4 3]
GFS Mount Group: "baieGC2b" 14 5 run -
[1 5]
GFS Mount Group: "baieGC3a" 0 7 join S-2,2,4
[]
----------------------------------------------------------------------------------------------------
Thanks,
--
------------------------------------------------------------------------
Fernando NIÑO CNES - BPi 2102
Medias-France/IRD 18, Av. Edouard Belin
Tél: 05.61.27.40.74 31401 Toulouse Cedex 9
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster