On Wed, Mar 23, 2005 at 02:11:00PM +0200, Oved Ourfali wrote: > I have GFS version 6 installes on rhl es3 update 3. > The GFS includes 3 nodes, a, b and c. > > The three nodes run the lock_gulm daemon, and thus it runs in RLM mode. > > I have done some tests to check that the GFS works correctly, and i > ran into some thing very weird: > Lets assume the master is A, and B and C are slaves. > Disconnecting B or C from the network works fine. > > Disconnecting A causes a problem. Lets assume B tries to be the new > master. B indicates that A is down, but for some reason it also thinks > that C is down, thus it waits for enough slaves to contact him, and it > doesn't happen. I tried to increase the timeout, and now it sometimes > work and sometimes don't. > > Does anyone have a clue why it is happening ? For some reason C isn't finding B in time to let it know that it is still alive. So, first question, what values are you using for heartbeat_rate and allowed_misses? Are you seeing this with the defaults? or are you using something else? (before increasing it) Also, you can add LoginLoops to the verbosity setting to have gulm print out much more detail when it is trying to connect and find the master server. -- Michael Conrad Tadpol Tilstra BE ALERT!!!! (The world needs more lerts ...)
Attachment:
pgpt3wcG3l5u1.pgp
Description: PGP signature