Re: GFS hangs, nodes die

Maciej Bogucki <maciej.bogucki@xxxxxxxxxxxxx> · Mon, 20 Aug 2007 09:42:03 +0200

> [...cut...]
> certain level (heavy I/O), it collapses. About 20% of the nodes do crash
> (not reacting any more, but no sign of kernel panic), the others can't
> access the gfs resource.
> [...cut...]
> 
> [root@compute-0-6 ~]# cat /proc/cluster/services
> Service          Name                              GID LID State     Code
> Fence Domain:    "default"                           3   2 recover 4 -
> [1 2 6 10 9 8 3 7 4 11]
> DLM Lock Space:  "clvmd"                             7   3 recover 0 -
> [1 2 6 10 9 8 3 7 4 11]
> DLM Lock Space:  "Magma"                            12   5 recover 0 -
> [1 2 6 10 9 8 3 7 4 11]
> DLM Lock Space:  "homeneu"                          17   6 recover 0 -
> [10 9 8 7 2 3 6 4 1 11]
> GFS Mount Group: "homeneu"                          18   7 recover 0 -
> [10 9 8 7 2 3 6 4 1 11]
> User:            "usrm::manager"                    11   4 recover 0 -
> [1 2 6 10 9 8 3 7 4 11]

Hello,

1. Do You have Fibre-Channel SAN storage or use GNDB/iSCSI?
2. The other nodes can't access GFS fs because cluster is in recover
state. Do You have fencing properly configured?

Best Regards
Maciej Bogucki

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster