Re: Fwd: GFS volume hangs on 3 nodes after gfs_grow

"Alan A" <alan.zg@xxxxxxxxx> · Fri, 26 Sep 2008 13:34:07 -0500

Again thanks for the fast and prompt response Bob.

I restored nodes to the healthy state and they can access GFS volumes. 
node3:
service gfs status
Configured GFS mountpoints: 
/lvm_test1

/lvm_test2
Active GFS mountpoints: 
/lvm_test1
/lvm_test2

node4:
service gfs status
Configured GFS mountpoints: 
/lvm_test1
/lvm_test2
Active GFS mountpoints: 
/lvm_test1
/lvm_test2

node2 - lucy node:
service gfs status
Configured GFS mountpoints: 
/lvm_test1
/lvm_test2
Active GFS mountpoints: 
/lvm_test1
/lvm_test2

I will try to reproduce the problem with gfs_grow.

One more question regarding GFS - what steps would you recommend (if any) for growing and shrinking active GFS volume?

On Fri, Sep 26, 2008 at 12:44 PM, Bob Peterson <rpeterso@xxxxxxxxxx> wrote:

----- "Alan A" <alan.zg@xxxxxxxxx> wrote:

| Thanks again, Bob.

|

| No kernel-panic on any of the nodes. I had to cold boot all 3 nodes in

| order

| to get the cluster going (might have been a fence issue but am not

| 100%

| sure, since we use only SCSI fencing until we agree on secondary

| fencing

| method). What is 'scary' is that gfs_grow command paralized that

| volume on

| all 3 nodes, and I coldn't access, nor unmount, nor run gfs_fsck, from

| any

| of the nodes. We will do more testing on this, btw do you have

| suggested

| "safe" method of growing and shrinking the volume other than what is

| noted

| in 5.2 documentation (since we followed the RHEL manual). If the GFS

| volume

| hangs - what is the best way to try and unmount it from the node,

| would

| 'gfs_freeze' helped)?

Hi Alan,

No, gfs_freeze won't help.  In these cases, it's probably best to

reboot the node that caused the problem, by /sbin/reboot -fin or

throwing the power switch I think.  I suspect that clvmd status

hung because of the earlier problem.

I'm not aware of any problems in your version of gfs_grow that can

cause this kind of lockup.  It's designed to be run seamlessly while

other processes are using the file system, and that's the kind of

thing we test regularly.

If you figure out how to recreate the lockup, let me know so I

can try it out.  Of course, if this is a production cluster, I

would not take it out of production a long time to try this.

But if I can recreate the problem here, I'll file a bugzilla

record and get it fixed.

Regards,

Bob Peterson

Red Hat Clustering & GFS

--

Linux-cluster mailing list

Linux-cluster@xxxxxxxxxx

https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Alan A.

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster