Re: [Linux-cluster] mount is hanging

Adam Manthei <amanthei@xxxxxxxxxx> · Fri, 1 Oct 2004 10:24:04 -0500

On Thu, Sep 30, 2004 at 04:01:44PM -0700, micah nerren wrote:
> Hi,
> 
> I have a SAN with 4 file systems on it, each GFS. These are mounted
> across various servers running GFS, 3 of which are lock_gulm servers.
> This is on RHEL WS 3 with GFS-6.0.0-7.1 on x86_64.

How many nodes?

> One of the file systems simply will not mount now. The other 3 mount and
> unmount fine. They are all part of the same cca. I have my master lock
> server running in heavy debug mode but none of the output from
> lock_gulmd tells me anything about this one bad pool. How can I figure
> out what is going on, any good debug or troubleshooting steps I should
> do? I think if I just reboot everything it will settle down, but we
> can't do that just yet, as the master lock server happens to be on a
> production box right now.

1) Are you certain that you have uniquely names all four filesystems?  You can
   use gfs_tool to verify that there are no duplicate names.

2) Is there an expired node that is not fenced holding a lock on that 
   filesystem?  gulm_tool will help there.

3) Did you ever have all 4 filesystems mounted at the same time on the same
   node?  i.e.  did it "all of a sudden" stop working or was it always 
   failing?

> Also, is there a way to migrate a master lock server to a slave lock
> server? In other words, can I force the master to become a slave and a
> slave to become the new master?

Restarting lock_gulmd on the master will cause one of the slaves to pick up
as master and the master to come back up as a slave.  Note that this only
works when you have a dedicated gulm server.  If you have an embedded master
server (a gulm server also mounting GFS) bad things will happen when the
server restarts.
> 
> Thanks!
> 
> --
> 
> Linux-cluster@xxxxxxxxxx
> http://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Adam Manthei  <amanthei@xxxxxxxxxx>