Greg,
Looks like Sage has a fix for this problem. In case it matters, I have
seen a few cases that conflict with your notes in this thread and the
bug report.
I have seen the bug exclusively on new Ceph installs (without upgrading
from bobtail), so it is not isolated to upgrades.
Further, I have seen it on test deployments with a single monitor, so it
doesn't seem to be limited to deployments with a leader and followers.
Thanks getting this bug moving forward.
Thanks,
Mike
On 4/18/2013 6:23 PM, Gregory Farnum wrote:
There's a little bit of python called ceph-create-keys, which is
invoked by the upstart scripts. You can kill the running processes,
and edit them out of the scripts, without direct harm. (Their purpose
is to create some standard keys which the newer deployment tools rely
on to do things like create OSDs, etc.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
On Thu, Apr 18, 2013 at 3:20 PM, Matthew Roy <imjustmatthew@xxxxxxxxx> wrote:
On 04/18/2013 06:03 PM, Joao Eduardo Luis wrote:
There's definitely some command messages being forwarded, but AFAICT
they're being forwarded to the monitor, not by the monitor, which by
itself is a good omen towards the monitor being the leader :-)
In any case, nothing in the trace's code path indicates we could be a
peon, unless the monitor itself believed to be the leader. If you take
a closer look, you'll see that we come from 'handle_last()', which is
bound to happen only on the leader (we'll assert otherwise). For the
monitor to be receiving these messages it must mean the peons believe
him to be the leader -- or we have so many bugs going around that it's
just madness!
In all seriousness, when I was chasing after this bug, Matthew sent me
his logs with higher debug levels -- no craziness going around :-)
-Joao
Is there a way to tell who's being "denied"? Even if it's just log
pollution I'd like to know which client is misconfigured. There are
similar messages in all the mon logs:
mon.a:
2013-04-18 18:16:51.254378 7fc7c6d10700 1 --
[2001:470:8:dd9::20]:6789/0 --> [2001:470:8:dd9::21]:6789/0 --
route(mon_command_ack([auth,get-or-create,client.admin,mon,allow
*,osd,allow *,mds,allow]=-13 access denied v775211) v1 tid 8867608) v2
-- ?+0 0x7fc61a18b160 con 0x253f700
mon.b:
2013-04-18 18:16:49.670758 7f37c7afa700 20 --
[2001:470:8:dd9::21]:6789/0 >> [2001:470:8:dd9::21]:0/22372
pipe(0x7f383c070b70 sd=90 :6789 s=2 pgs=1 cs=1 l=1).writer encoding 7
0x7f37f49876a0
mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
*,mds,allow]=-13 access denied v775209) v1
(mon.c was removed since the first log file in the thread)
mon.d:
2013-04-18 18:16:51.304897 7f927d40f700 1 --
[2001:470:8:dd9:7271:bcff:febd:e398]:6789/0 --> client.?
[2001:470:8:dd9::21]:0/26333 --
mon_command_ack([auth,get-or-create,client.admin,mon,allow *,osd,allow
*,mds,allow]=-13 access denied v775211) v1 -- ?+0 0x7f923c0230a0
The spacing on these messages is about 0.001s so there's a lot of them
going around. All these systems are running 0.60-472-g327002e
Matthew
--
Matthew
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com