That looks like dmesg output from the libceph kernel module. Do you have the libceph kernel module loaded? If the answer to that question is "yes" the follow-up question is "Why?" as it is not required for a MON or OSD host. On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen <michael@xxxxxxxxxxxxx> wrote: > Yeah, all three mons have OSDs on the same machines. > > On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <skinjo@xxxxxxxxxx> wrote: >> >> Is your primary MON running on the host which some OSDs are running on? >> >> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen >> <michael@xxxxxxxxxxxxx> wrote: >> > Hi >> > >> > I am running a small cluster of 8 machines (80 osds), with three >> > monitors on >> > Ubuntu 16.04. Ceph version 10.2.5. >> > >> > I cannot reboot the monitors without physically going into the >> > datacenter >> > and power cycling them. What happens is that while shutting down, ceph >> > gets >> > stuck trying to contact the other monitors but networking has already >> > shut >> > down or something like that. I get an endless stream of: >> > >> > libceph: connect 10.20.0.10:6789 error -101 >> > libceph: connect 10.20.0.13:6789 error -101 >> > libceph: connect 10.20.0.17:6789 error -101 >> > >> > where in this case 10.20.0.10 is the machine I am trying to shut down >> > and >> > all three IPs are the MONs. >> > >> > At this stage of the shutdown, the machine doesn't respond to pings, and >> > I >> > cannot even log in on any of the virtual terminals. Nothing to do but >> > poweroff at the server. >> > >> > The other non-mon servers shut down just fine, and the cluster was >> > healthy >> > at the time I was rebooting the mon (I only reboot one machine at a >> > time, >> > waiting for it to come up before I do the next one). >> > >> > Also worth mentioning that if I execute >> > >> > sudo systemctl stop ceph\*.service ceph\*.target >> > >> > on the server, the only things I see are: >> > >> > root 11143 2 0 18:40 ? 00:00:00 [ceph-msgr] >> > root 11162 2 0 18:40 ? 00:00:00 [ceph-watch-noti] >> > >> > and even then, when no ceph daemons are left running, doing a reboot >> > goes >> > into the same loop. >> > >> > I can't really find any mention of this online, but I feel someone must >> > have >> > hit this. Any idea how to fix it? It's really annoying because its hard >> > for >> > me to get access to the datacenter. >> > >> > Thanks >> > Michael >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com