[Re-adding the list, so this is archived for future posterity.] On Wed, Oct 29, 2014 at 6:11 AM, Patrick Darley <patrick.darley@xxxxxxxxxxxxxxx> wrote: > > Thanks again for the reply Greg! > > On 2014-10-28 17:39, Gregory Farnum wrote: >> >> I'm sorry, you're right — I misread it. :( > > > No worries, I had included some misleading words like generate in my rough > description where retrive would have been more appropriate. Sorry! > >> But indeed step 6 is the crucial one, which tells the existing >> monitors to accept the new one into the cluster. You'll need to run it >> with an admin client keyring that can connect to the existing cluster; >> that's probably the part that has gone wrong. You don't need to run it >> from the new monitor, > > > I think, in order to carry out the 5th step you also need the client.admin > keyring present, that'd be "preparing the monitors data directory". I had > scp-ed it across to the monitor along with the ceph.conf file and pu them in > the expected location, /etc/ceph/, prior to running that command. > >> so if you're having trouble getting the keys to >> behave I'd just run it from an existing system. :) > > > I tried running this command, step 6, from the admin node of my ubuntu ceph > cluster. > As I had experienced before, the command hung. Then trying to run any ceph > commands on the > rest of the cluster I get a long hang then the following error: > > cc@ucc01:~$ ceph -s > 2014-10-29 10:40:33.748334 7ffaec051700 0 monclient(hunting): > authenticate timed out after 300 > 2014-10-29 10:40:33.748499 7ffaec051700 0 librados: client.admin > authentication error (110) Connection timed out > Error connecting to cluster: TimedOut > > > The monitor that I was trying to add can be started ok after this (once I > have touched the done and sysvinit files) but also gives the > above error when attempting to run the ceph -s. Checking the log file I see > the following lines repeated: > > > 2014-10-29 10:01:01.721905 7ffd548ac700 0 mon.bcc07@-1(probing) e0 > handle_probe ignoring fsid 5021163c-3c0b-4ec5-83fe-f0622c0e9447 != > f2d609ef-2065-4862-a821-55c484d61dca > 2014-10-29 10:01:01.809991 7ffd550ad700 1 > mon.bcc07@-1(probing).paxos(paxos recovering c 0..0) is_readable > now=2014-10-29 10:01:01.809996 lease_expire=0.000000 has v0 lc 0 > 2014-10-29 10:01:03.721559 7ffd548ac700 0 mon.bcc07@-1(probing) e0 > handle_probe ignoring fsid 5021163c-3c0b-4ec5-83fe-f0622c0e9447 != > f2d609ef-2065-4862-a821-55c484d61dca > 2014-10-29 10:01:03.810466 7ffd550ad700 1 > mon.bcc07@-1(probing).paxos(paxos recovering c 0..0) is_readable > now=2014-10-29 10:01:03.810467 lease_expire=0.000000 has v0 lc 0 > > > The initial monitor has the following log at around a similar time: > > > 2014-10-29 10:01:02.169655 7f52e7408700 0 mon.ucc01@1(probing) e2 > handle_probe ignoring fsid f2d609ef-2065-4862-a821-55c484d61dca != > 5021163c-3c0b-4ec5-83fe-f0622c0e9447 > 2014-10-29 10:01:04.170153 7f52e7408700 0 mon.ucc01@1(probing) e2 > handle_probe ignoring fsid f2d609ef-2065-4862-a821-55c484d61dca != > 5021163c-3c0b-4ec5-83fe-f0622c0e9447 > 2014-10-29 10:01:06.169300 7f52e7408700 0 mon.ucc01@1(probing) e2 > handle_probe ignoring fsid f2d609ef-2065-4862-a821-55c484d61dca != > 5021163c-3c0b-4ec5-83fe-f0622c0e9447 > > > It looks to me like there might be conflicting fsid values being compared > somewhere, but checking the ceph.conf files on the > nodes I found them to be declared as the same. The log files recorded a > similar output on both monitors for some time. > > I then turned off the monitor I was attempting to add at approximately > 12:39:30 and the log file of the initial > monitor has the following output around this time: > > > 2014-10-29 12:39:30.304639 7f52e7408700 0 mon.ucc01@1(probing) e2 > handle_probe ignoring fsid f2d609ef-2065-4862-a821-55c484d61dca != > 5021163c-3c0b-4ec5-83fe-f0622c0e9447 Okay, that's indeed not right. I suspect this is your issue but I'm not entirely certain because your other symptoms are a bit weird. I bet Joao can help though; he maintains the monitor and deals with these issues a lot more often than I do. :) -Greg > 2014-10-29 12:39:32.023964 7f52e7c09700 0 > mon.ucc01@1(probing).data_health(1) update_stats avail 68% total 14318640 > used 3748076 avail 9820180 > 2014-10-29 12:39:32.303740 7f52e7408700 0 mon.ucc01@1(probing) e2 > handle_probe ignoring fsid f2d609ef-2065-4862-a821-55c484d61dca != > 5021163c-3c0b-4ec5-83fe-f0622c0e9447 > 2014-10-29 12:39:32.394606 7f52e53fd700 0 -- 192.168.122.95:6789/0 >> > 192.168.122.42:6789/0 pipe(0x55e5180 sd=24 :6789 s=2 pgs=1 cs=1 l=0 > c=0x39bfde0).fault with nothing to send, going to standby > 2014-10-29 12:39:33.862400 7f52e5902700 0 -- 192.168.122.95:6789/0 >> > 192.168.122.42:6789/0 pipe(0x55e5180 sd=13 :6789 s=1 pgs=1 cs=2 l=0 > c=0x39bfde0).fault > 2014-10-29 12:40:32.024807 7f52e7c09700 0 > mon.ucc01@1(probing).data_health(1) update_stats avail 68% total 14318640 > used 3748072 avail 9820184 > 2014-10-29 12:41:32.025632 7f52e7c09700 0 > mon.ucc01@1(probing).data_health(1) update_stats avail 68% total 14318640 > used 3748072 avail 9820184 > 2014-10-29 12:42:32.027091 7f52e7c09700 0 > mon.ucc01@1(probing).data_health(1) update_stats avail 68% total 14318640 > used 3748072 avail 9820184 > > > It seems to me that the initial monitor is probing for the monitor I have > tried to add, > then starting this monitor it is probing for for there is some > conflict with fsid values. Then when the monitor I am trying to add is > stopped again the, > initial monitor goes back to probing. > > > Is this what the problem might be? and why would this have happened? > > If not, do you know what the problem might be? > > And any ideas about what I might have done wrong or what could be done > differently? > > Thanks again, > > Patrick > > > >> On Tue, Oct 28, 2014 at 10:11 AM, Patrick Darley >> <patrick.darley@xxxxxxxxxxxxxxx> wrote: >>> >>> On 2014-10-28 16:08, Gregory Farnum wrote: >>>> >>>> >>>> On Mon, Oct 27, 2014 at 11:37 AM, Patrick Darley >>>> <patrick.darley@xxxxxxxxxxxxxxx> wrote: >>>>> >>>>> >>>>> Hi there >>>>> >>>>> Over the last week or so, I've been trying to connect a ceph monitor >>>>> node >>>>> running on a baserock system >>>>> to connect to a simple 3-node ubuntu ceph cluster. >>>>> >>>>> The 3 node ubunutu cluster was created by following the documented >>>>> Quick >>>>> installation guide using 3 VMs running ubuntu Trusty. >>>>> >>>>> After the ubuntu cluster has been deployed I would then follow the >>>>> directions below, which I derived from comparing the ceph-deploy debug >>>>> information, the ceph documentation on adding monitor nodes to an >>>>> existing >>>>> system and the ceph documentation on bootstrapping monitor nodes. >>>>> >>>>> 1. scp the /etc/ceph/* from admin node >>>>> 2. create the dir: mkdir /var/lib/ceph/mon/ceph-bcc08 >>>>> 3. generate mon keyring: sudo ceph auth get mon. -o >>>>> /var/lib/ceph/tmp/ceph-bcc08.mon.keyring >>>>> 4. generate monmap: sudo ceph mon getmap -o /var/lib/ceph/tmp/monmap >>> >>> >>> >>>> Yeah, this is wrong. You're here giving the monitor its own keyring >>>> which it is going to expect anybody to talk to to be encrypting with. >>> >>> >>> >>> If you are referring to steps 3 and 4 above, I believe these are >>> synonymous >>> with >>> steps 3 and 4 of the documentation you recommended. The monitor keyring >>> and >>> the >>> current monmap are retrieved from the initial monitor. they are then used >>> in >>> step 5 >>> to prepare the monitor's data directory. >>> >>> >>>> The docs have a section on adding monitors which should work verbatim; >>>> if not it's a doc bug: >>>> >>>> >>>> >>>> >>>> http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#adding-monitors >>> >>> >>> >>> Thanks for the recommendation. I have tried to use this procedure a >>> couple >>> of times but got stuck at step number >>> 6 of this method. The command given hangs then times out, causing the >>> rest >>> of the cluster to fail. >>> >>>> -Greg >>> >>> >>> >>> >>> Thanks for the reply! >>> >>> Much appreciated, >>> >>> Patrick >>> >>> >>> >>> >>> >>> >>>>> 5. That filesystem thingy: sudo ceph-mon --cluster ceph --mkfs -i >>>>> bcc08 >>>>> --keyring /var/lib/ceph/tmp/ceph-bcc08.mon.keyring --monmap >>>>> /var/lib/ceph/tmp/monmap >>>>> 6. Unlink keys and old monmap: rm /var/lib/ceph/tmp/* >>>>> 7. touch things: touch /var/lib/ceph/mon/ceph-bcc08/done and touch >>>>> /var/lib/ceph/mon/ceph-bcc08/sysvinit >>>>> 8. Then start the mon: sudo /etc/init.d/ceph start mon.bcc08 >>>>> >>>>> When I carry out these steps in the attempt to add a baserock system to >>>>> the >>>>> ubuntu cluster, the monitor node has not been added to the cluster and >>>>> the >>>>> admin socket mon_status gives the following output. >>>>> >>>>> ~ # ceph --cluster=ceph --admin-daemon >>>>> /var/run/ceph/ceph-mon.bcc07.asok >>>>> mon_status >>>>> { "name": "bcc07", >>>>> "rank": -1, >>>>> "state": "probing", >>>>> "election_epoch": 0, >>>>> "quorum": [], >>>>> "outside_quorum": [], >>>>> "extra_probe_peers": [], >>>>> "sync_provider": [], >>>>> "monmap": { "epoch": 0, >>>>> "fsid": "4460079d-42f4-4e3a-8ce3-e2a7fa2685e6", >>>>> "modified": "2014-10-27 12:37:25.531542", >>>>> "created": "2014-10-27 12:37:25.531542", >>>>> "mons": [ >>>>> { "rank": 0, >>>>> "name": "ucc01", >>>>> "addr": "192.168.122.95:6789\/0"}]}} >>>>> >>>>> >>>>> And the newly added monitor remains stuck in the probing state >>>>> indefinitely. >>>>> To try and resolve >>>>> this issue I have looked at the problems monitor troubleshooting page >>>>> of >>>>> the >>>>> ceph documentation, eg. ntp sychronisation and checking network >>>>> connectivity >>>>> (to the best of my ability :-s ). >>>>> >>>>> It is also worth mentioning that I have created a 3 node ceph cluster >>>>> on >>>>> baserock machines (1 mon, 2 osds) then successfully added monitor nodes >>>>> running baserock and ubuntu systems using the same 8 step process given >>>>> above. >>>>> >>>>> This leaves me confused as to why adding the monitor run on baserock to >>>>> the >>>>> all ubuntu cluster specifically is causing problems. >>>>> >>>>> Are there any reasons why this 'probing' problem could be occuring? Im >>>>> feeling a little stuck of how to proceed and would welcome any >>>>> suggestions. >>>>> >>>>> Thanks for your help, >>>>> >>>>> Patrick >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com