Just making sure the list sees this for those that are following. On Sat, Feb 11, 2017 at 2:49 PM, Michael Andersen <michael@xxxxxxxxxxxxx> wrote: > Right, so yes libceph is loaded > > root@compound-7:~# lsmod | egrep "ceph|rbd" > rbd 69632 0 > libceph 245760 1 rbd > libcrc32c 16384 3 xfs,raid456,libceph > > I stopped all the services and unloaded the modules > > root@compound-7:~# systemctl stop ceph\*.service ceph\*.target > root@compound-7:~# modprobe -r rbd > root@compound-7:~# modprobe -r libceph > root@compound-7:~# lsmod | egrep "ceph|rbd" > > Then rebooted > root@compound-7:~# reboot > > And sure enough the reboot happened OK. > > So that solves my immediate problem, I now know how to work around it > (thanks!), but I would love to work out how to not need this step. Any > further info I can give to help? > > > > On Fri, Feb 10, 2017 at 8:42 PM, Michael Andersen <michael@xxxxxxxxxxxxx> > wrote: >> >> Sorry this email arrived out of order. I will do the modprobe -r test >> >> On Fri, Feb 10, 2017 at 8:20 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: >>> >>> On Sat, Feb 11, 2017 at 2:08 PM, Michael Andersen <michael@xxxxxxxxxxxxx> >>> wrote: >>> > I believe I did shutdown mon process. Is that not done by the >>> > >>> > sudo systemctl stop ceph\*.service ceph\*.target >>> > >>> > command? Also, as I noted, the mon process does not show up in ps after >>> > I do >>> > that, but I still get the shutdown halting. >>> > >>> > The libceph kernel module may be installed. I did not do so >>> > deliberately but >>> > I used ceph-deploy so if it installs that then that is why it's there. >>> > I >>> > also run some kubernetes pods with rbd persistent volumes on these >>> > machines, >>> > although no rbd volumes are in use or mounted when I try shut down. In >>> > fact >>> > I unmapped all rbd volumes across the whole cluster to make sure. Is >>> > libceph >>> > required for rbd? >>> >>> For kernel rbd (/dev/rbd0, etc.) yes, for librbd, no. >>> >>> As a test try modprobe -r on both the libceph and rbd modules before >>> shutdown and see if that helps ("modprobe -r rbd" should unload >>> libceph as well but verify that). >>> >>> > >>> > But even so, is it normal for the libceph kernel module to prevent >>> > shutdown? >>> > Is there another stage in the shutdown procedure that I am missing? >>> > >>> > >>> > On Feb 10, 2017 7:49 PM, "Brad Hubbard" <bhubbard@xxxxxxxxxx> wrote: >>> > >>> > That looks like dmesg output from the libceph kernel module. Do you >>> > have the libceph kernel module loaded? >>> > >>> > If the answer to that question is "yes" the follow-up question is >>> > "Why?" as it is not required for a MON or OSD host. >>> > >>> > On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen >>> > <michael@xxxxxxxxxxxxx> >>> > wrote: >>> >> Yeah, all three mons have OSDs on the same machines. >>> >> >>> >> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <skinjo@xxxxxxxxxx> wrote: >>> >>> >>> >>> Is your primary MON running on the host which some OSDs are running >>> >>> on? >>> >>> >>> >>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen >>> >>> <michael@xxxxxxxxxxxxx> wrote: >>> >>> > Hi >>> >>> > >>> >>> > I am running a small cluster of 8 machines (80 osds), with three >>> >>> > monitors on >>> >>> > Ubuntu 16.04. Ceph version 10.2.5. >>> >>> > >>> >>> > I cannot reboot the monitors without physically going into the >>> >>> > datacenter >>> >>> > and power cycling them. What happens is that while shutting down, >>> >>> > ceph >>> >>> > gets >>> >>> > stuck trying to contact the other monitors but networking has >>> >>> > already >>> >>> > shut >>> >>> > down or something like that. I get an endless stream of: >>> >>> > >>> >>> > libceph: connect 10.20.0.10:6789 error -101 >>> >>> > libceph: connect 10.20.0.13:6789 error -101 >>> >>> > libceph: connect 10.20.0.17:6789 error -101 >>> >>> > >>> >>> > where in this case 10.20.0.10 is the machine I am trying to shut >>> >>> > down >>> >>> > and >>> >>> > all three IPs are the MONs. >>> >>> > >>> >>> > At this stage of the shutdown, the machine doesn't respond to >>> >>> > pings, >>> >>> > and >>> >>> > I >>> >>> > cannot even log in on any of the virtual terminals. Nothing to do >>> >>> > but >>> >>> > poweroff at the server. >>> >>> > >>> >>> > The other non-mon servers shut down just fine, and the cluster was >>> >>> > healthy >>> >>> > at the time I was rebooting the mon (I only reboot one machine at a >>> >>> > time, >>> >>> > waiting for it to come up before I do the next one). >>> >>> > >>> >>> > Also worth mentioning that if I execute >>> >>> > >>> >>> > sudo systemctl stop ceph\*.service ceph\*.target >>> >>> > >>> >>> > on the server, the only things I see are: >>> >>> > >>> >>> > root 11143 2 0 18:40 ? 00:00:00 [ceph-msgr] >>> >>> > root 11162 2 0 18:40 ? 00:00:00 [ceph-watch-noti] >>> >>> > >>> >>> > and even then, when no ceph daemons are left running, doing a >>> >>> > reboot >>> >>> > goes >>> >>> > into the same loop. >>> >>> > >>> >>> > I can't really find any mention of this online, but I feel someone >>> >>> > must >>> >>> > have >>> >>> > hit this. Any idea how to fix it? It's really annoying because its >>> >>> > hard >>> >>> > for >>> >>> > me to get access to the datacenter. >>> >>> > >>> >>> > Thanks >>> >>> > Michael >>> >>> > >>> >>> > _______________________________________________ >>> >>> > ceph-users mailing list >>> >>> > ceph-users@xxxxxxxxxxxxxx >>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> > >>> >> >>> >> >>> >> _______________________________________________ >>> >> ceph-users mailing list >>> >> ceph-users@xxxxxxxxxxxxxx >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >>> > >>> > >>> > >>> > -- >>> > Cheers, >>> > Brad >>> > >>> > >>> >>> >>> >>> -- >>> Cheers, >>> Brad >> >> > -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com