On Sat, Feb 11, 2017 at 1:08 PM, Michael Andersen <michael@xxxxxxxxxxxxx> wrote: > I believe I did shutdown mon process. Is that not done by the > > sudo systemctl stop ceph\*.service ceph\*.target Oh, that's I missed. > > command? Also, as I noted, the mon process does not show up in ps after I do > that, but I still get the shutdown halting. > > The libceph kernel module may be installed. I did not do so deliberately but > I used ceph-deploy so if it installs that then that is why it's there. I > also run some kubernetes pods with rbd persistent volumes on these machines, > although no rbd volumes are in use or mounted when I try shut down. In fact > I unmapped all rbd volumes across the whole cluster to make sure. Is libceph > required for rbd? > > But even so, is it normal for the libceph kernel module to prevent shutdown? > Is there another stage in the shutdown procedure that I am missing? > > > On Feb 10, 2017 7:49 PM, "Brad Hubbard" <bhubbard@xxxxxxxxxx> wrote: > > That looks like dmesg output from the libceph kernel module. Do you > have the libceph kernel module loaded? > > If the answer to that question is "yes" the follow-up question is > "Why?" as it is not required for a MON or OSD host. > > On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen <michael@xxxxxxxxxxxxx> > wrote: >> Yeah, all three mons have OSDs on the same machines. >> >> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <skinjo@xxxxxxxxxx> wrote: >>> >>> Is your primary MON running on the host which some OSDs are running on? >>> >>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen >>> <michael@xxxxxxxxxxxxx> wrote: >>> > Hi >>> > >>> > I am running a small cluster of 8 machines (80 osds), with three >>> > monitors on >>> > Ubuntu 16.04. Ceph version 10.2.5. >>> > >>> > I cannot reboot the monitors without physically going into the >>> > datacenter >>> > and power cycling them. What happens is that while shutting down, ceph >>> > gets >>> > stuck trying to contact the other monitors but networking has already >>> > shut >>> > down or something like that. I get an endless stream of: >>> > >>> > libceph: connect 10.20.0.10:6789 error -101 >>> > libceph: connect 10.20.0.13:6789 error -101 >>> > libceph: connect 10.20.0.17:6789 error -101 >>> > >>> > where in this case 10.20.0.10 is the machine I am trying to shut down >>> > and >>> > all three IPs are the MONs. >>> > >>> > At this stage of the shutdown, the machine doesn't respond to pings, >>> > and >>> > I >>> > cannot even log in on any of the virtual terminals. Nothing to do but >>> > poweroff at the server. >>> > >>> > The other non-mon servers shut down just fine, and the cluster was >>> > healthy >>> > at the time I was rebooting the mon (I only reboot one machine at a >>> > time, >>> > waiting for it to come up before I do the next one). >>> > >>> > Also worth mentioning that if I execute >>> > >>> > sudo systemctl stop ceph\*.service ceph\*.target >>> > >>> > on the server, the only things I see are: >>> > >>> > root 11143 2 0 18:40 ? 00:00:00 [ceph-msgr] >>> > root 11162 2 0 18:40 ? 00:00:00 [ceph-watch-noti] >>> > >>> > and even then, when no ceph daemons are left running, doing a reboot >>> > goes >>> > into the same loop. >>> > >>> > I can't really find any mention of this online, but I feel someone must >>> > have >>> > hit this. Any idea how to fix it? It's really annoying because its hard >>> > for >>> > me to get access to the datacenter. >>> > >>> > Thanks >>> > Michael >>> > >>> > _______________________________________________ >>> > ceph-users mailing list >>> > ceph-users@xxxxxxxxxxxxxx >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > -- > Cheers, > Brad > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com