Does anyone have any idea why incorrect entries in
/etc/hosts of the lock servers would intermittently cause the "Errors trying to
login to LT000: ...1006:Not Allowed?" I would think this would be
something that if wrong should *consistently* cause the client not to
be allowed into the lockspace.
Additionally can anyone explain the fundamentals of GFS 6.0
lock tables and the locking process. A couple specific questions I
have...
What is the difference between LTPX and
the LT000?
What is the advantage of having
additional lock tables and when would having more be a
disadvantage?
Is each lock propagated to each
locktable or is it held in only one table?
Is the highwater mark for each locktable
or the sum of locks across all locktables?
Regards,
Britt Treece
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Britt Treece
Sent: Monday, March 05, 2007 10:51 PM
To: linux clustering
Subject: Re: RE: Errors trying to login to LT000: ...1006:Not Allowed
---
I am running a 13 node GFS (6.0.2.33) cluster with 10 mounting clients and 3 dedicated lock servers. The master lock server was rebooted and the next slave in the voting order took over. At that time 3 of the client nodes started receiving login errors for the ltpx server
Mar 4 00:05:52 lock1 lock_gulmd_core[3798]: Master Node Is Logging Out NOW!
...
Mar 4 00:05:52 lock2 lock_gulmd_core[24627]: Master Node has logged out.
Mar 4 00:05:54 lock2 lock_gulmd_core[24627]: I see no Masters, So I am Arbitrating until enough Slaves talk to me.
Mar 4 00:05:54 lock2 lock_gulmd_LTPX[24638]: New Master at lock2 :192.168.1.3
Mar 4 00:05:56 lock2 lock_gulmd_core[24627]: Now have Slave quorum, going full Master.
Mar 4 00:11:39 lock2 lock_gulmd_core[24627]: Master Node Is Logging Out NOW!
…
Mar 4 00:05:52 client1 kernel: lock_gulm: Checking for journals for node "lock1 "
Mar 4 00:05:52 client1 lock_gulmd_core[9383]: Master Node has logged out.
Mar 4 00:05:52 client1 kernel: lock_gulm: Checking for journals for node "lock1 "
Mar 4 00:05:56 client1 lock_gulmd_core[9383]: Found Master at lock2 , so I'm a Client.
Mar 4 00:05:56 client1 lock_gulmd_core[9383]: Failed to receive a timely heartbeat reply from Master. (t:1172988356370685 mb:1)
Mar 4 00:05:56 client1 lock_gulmd_LTPX[9390]: New Master at lock2 :192.168.1.3
Mar 4 00:06:01 client1 lock_gulmd_LTPX[9390]: Errors trying to login to LT002: (lock2 :192.168.1.3) 1006:Not Allowed
Mar 4 00:06:01 client1 lock_gulmd_LTPX[9390]: Errors trying to login to LT000: (lock2 :192.168.1.3) 1006:Not Allowed
Mar 4 00:06:02 client1 lock_gulmd_LTPX[9390]: Errors trying to login to LT000: (lock2 :192.168.1.3) 1006:Not Allowed
Mar 4 00:06:02 client1 lock_gulmd_LTPX[9390]: Errors trying to login to LT002: (lock2 :192.168.1.3) 1006:Not Allowed
Mar 4 00:06:02 client1 lock_gulmd_LTPX[9390]: Errors trying to login to LT004: (lock2 :192.168.1.3) 1006:Not Allowed
Mar 4 00:06:02 client1 lock_gulmd_LTPX[9390]: Errors trying to login to LT001: (lock2 :192.168.1.3) 1006:Not Allowed
---
Britt
On 3/5/07 10:30 PM, "Treece, Britt" <Britt.Treece@xxxxxxxxxx> wrote:
All,
After much further investigation I found /etc/hosts is off by one for these 3 client nodes on all 3 lock servers. Having fixed the typo's is it safe to assume that the root of the problem trying to login to LTPX is that /etc/hosts on the lock servers was wrong for these nodes? If yes, why would these 3 clients be allowed into the cluster when it was originally started being that they had incorrect entries in /etc/hosts?
Regards,
Britt Treece
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster