Hiya, On Fri, 2004-10-01 at 08:24, Adam Manthei wrote: > On Thu, Sep 30, 2004 at 04:01:44PM -0700, micah nerren wrote: > > Hi, > > > > I have a SAN with 4 file systems on it, each GFS. These are mounted > > across various servers running GFS, 3 of which are lock_gulm servers. > > This is on RHEL WS 3 with GFS-6.0.0-7.1 on x86_64. > > How many nodes? All total 4 servers mounting the 4 file systems. 3 of them are lock_gulm servers. > > One of the file systems simply will not mount now. The other 3 mount and > > unmount fine. They are all part of the same cca. I have my master lock > > server running in heavy debug mode but none of the output from > > lock_gulmd tells me anything about this one bad pool. How can I figure > > out what is going on, any good debug or troubleshooting steps I should > > do? I think if I just reboot everything it will settle down, but we > > can't do that just yet, as the master lock server happens to be on a > > production box right now. > > 1) Are you certain that you have uniquely names all four filesystems? You can > use gfs_tool to verify that there are no duplicate names. Yes, there are no duplicate names. They all have unique names. > 2) Is there an expired node that is not fenced holding a lock on that > filesystem? gulm_tool will help there. No expired node. gulm_tool tells me everything is perfectly fine, hence the ability of all the nodes to mount the other 3 file systems. I have tried manually fencing and unfencing two of the systems, to no avail. > 3) Did you ever have all 4 filesystems mounted at the same time on the same > node? i.e. did it "all of a sudden" stop working or was it always > failing? Yes, its been running fine for several weeks. It "suddenly" freaked out. It is possible the customer did something I am unaware of, but I don't know what they could have done to cause this. > > Also, is there a way to migrate a master lock server to a slave lock > > server? In other words, can I force the master to become a slave and a > > slave to become the new master? > > Restarting lock_gulmd on the master will cause one of the slaves to pick up > as master and the master to come back up as a slave. Note that this only > works when you have a dedicated gulm server. If you have an embedded master > server (a gulm server also mounting GFS) bad things will happen when the > server restarts. Ugh, thats what I really need to avoid. I do not have dedicated gulm servers, the master is on a machine that is also mounting the file systems and is in heavy production use. I am quite certain from past experiences that just rebooting all 4 servers will fix this up, but I can't do that. What I am going to try right now is blowing away the one pool that is acting up, rebuilding it and seeing if that works. Luckily this one pool is non-critical and is backed up, so I can just nuke it. Thanks, Micah