On Sat, Feb 11, 2017 at 2:58 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: > Just making sure the list sees this for those that are following. > > On Sat, Feb 11, 2017 at 2:49 PM, Michael Andersen <michael@xxxxxxxxxxxxx> wrote: >> Right, so yes libceph is loaded >> >> root@compound-7:~# lsmod | egrep "ceph|rbd" >> rbd 69632 0 >> libceph 245760 1 rbd >> libcrc32c 16384 3 xfs,raid456,libceph >> >> I stopped all the services and unloaded the modules >> >> root@compound-7:~# systemctl stop ceph\*.service ceph\*.target >> root@compound-7:~# modprobe -r rbd >> root@compound-7:~# modprobe -r libceph >> root@compound-7:~# lsmod | egrep "ceph|rbd" >> >> Then rebooted >> root@compound-7:~# reboot >> >> And sure enough the reboot happened OK. >> >> So that solves my immediate problem, I now know how to work around it >> (thanks!), but I would love to work out how to not need this step. Any Can you double-check that all rbd volumes are unmounted on this host when shutting down? Maybe unmap them just for good measure. I don't believe the libceph module should need to talk to the cluster unless it has active connections at the time of shutdown. >> further info I can give to help? >> >> >> >> On Fri, Feb 10, 2017 at 8:42 PM, Michael Andersen <michael@xxxxxxxxxxxxx> >> wrote: >>> >>> Sorry this email arrived out of order. I will do the modprobe -r test >>> >>> On Fri, Feb 10, 2017 at 8:20 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: >>>> >>>> On Sat, Feb 11, 2017 at 2:08 PM, Michael Andersen <michael@xxxxxxxxxxxxx> >>>> wrote: >>>> > I believe I did shutdown mon process. Is that not done by the >>>> > >>>> > sudo systemctl stop ceph\*.service ceph\*.target >>>> > >>>> > command? Also, as I noted, the mon process does not show up in ps after >>>> > I do >>>> > that, but I still get the shutdown halting. >>>> > >>>> > The libceph kernel module may be installed. I did not do so >>>> > deliberately but >>>> > I used ceph-deploy so if it installs that then that is why it's there. >>>> > I >>>> > also run some kubernetes pods with rbd persistent volumes on these >>>> > machines, >>>> > although no rbd volumes are in use or mounted when I try shut down. In >>>> > fact >>>> > I unmapped all rbd volumes across the whole cluster to make sure. Is >>>> > libceph >>>> > required for rbd? >>>> >>>> For kernel rbd (/dev/rbd0, etc.) yes, for librbd, no. >>>> >>>> As a test try modprobe -r on both the libceph and rbd modules before >>>> shutdown and see if that helps ("modprobe -r rbd" should unload >>>> libceph as well but verify that). >>>> >>>> > >>>> > But even so, is it normal for the libceph kernel module to prevent >>>> > shutdown? >>>> > Is there another stage in the shutdown procedure that I am missing? >>>> > >>>> > >>>> > On Feb 10, 2017 7:49 PM, "Brad Hubbard" <bhubbard@xxxxxxxxxx> wrote: >>>> > >>>> > That looks like dmesg output from the libceph kernel module. Do you >>>> > have the libceph kernel module loaded? >>>> > >>>> > If the answer to that question is "yes" the follow-up question is >>>> > "Why?" as it is not required for a MON or OSD host. >>>> > >>>> > On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen >>>> > <michael@xxxxxxxxxxxxx> >>>> > wrote: >>>> >> Yeah, all three mons have OSDs on the same machines. >>>> >> >>>> >> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <skinjo@xxxxxxxxxx> wrote: >>>> >>> >>>> >>> Is your primary MON running on the host which some OSDs are running >>>> >>> on? >>>> >>> >>>> >>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen >>>> >>> <michael@xxxxxxxxxxxxx> wrote: >>>> >>> > Hi >>>> >>> > >>>> >>> > I am running a small cluster of 8 machines (80 osds), with three >>>> >>> > monitors on >>>> >>> > Ubuntu 16.04. Ceph version 10.2.5. >>>> >>> > >>>> >>> > I cannot reboot the monitors without physically going into the >>>> >>> > datacenter >>>> >>> > and power cycling them. What happens is that while shutting down, >>>> >>> > ceph >>>> >>> > gets >>>> >>> > stuck trying to contact the other monitors but networking has >>>> >>> > already >>>> >>> > shut >>>> >>> > down or something like that. I get an endless stream of: >>>> >>> > >>>> >>> > libceph: connect 10.20.0.10:6789 error -101 >>>> >>> > libceph: connect 10.20.0.13:6789 error -101 >>>> >>> > libceph: connect 10.20.0.17:6789 error -101 >>>> >>> > >>>> >>> > where in this case 10.20.0.10 is the machine I am trying to shut >>>> >>> > down >>>> >>> > and >>>> >>> > all three IPs are the MONs. >>>> >>> > >>>> >>> > At this stage of the shutdown, the machine doesn't respond to >>>> >>> > pings, >>>> >>> > and >>>> >>> > I >>>> >>> > cannot even log in on any of the virtual terminals. Nothing to do >>>> >>> > but >>>> >>> > poweroff at the server. >>>> >>> > >>>> >>> > The other non-mon servers shut down just fine, and the cluster was >>>> >>> > healthy >>>> >>> > at the time I was rebooting the mon (I only reboot one machine at a >>>> >>> > time, >>>> >>> > waiting for it to come up before I do the next one). >>>> >>> > >>>> >>> > Also worth mentioning that if I execute >>>> >>> > >>>> >>> > sudo systemctl stop ceph\*.service ceph\*.target >>>> >>> > >>>> >>> > on the server, the only things I see are: >>>> >>> > >>>> >>> > root 11143 2 0 18:40 ? 00:00:00 [ceph-msgr] >>>> >>> > root 11162 2 0 18:40 ? 00:00:00 [ceph-watch-noti] >>>> >>> > >>>> >>> > and even then, when no ceph daemons are left running, doing a >>>> >>> > reboot >>>> >>> > goes >>>> >>> > into the same loop. >>>> >>> > >>>> >>> > I can't really find any mention of this online, but I feel someone >>>> >>> > must >>>> >>> > have >>>> >>> > hit this. Any idea how to fix it? It's really annoying because its >>>> >>> > hard >>>> >>> > for >>>> >>> > me to get access to the datacenter. >>>> >>> > >>>> >>> > Thanks >>>> >>> > Michael >>>> >>> > >>>> >>> > _______________________________________________ >>>> >>> > ceph-users mailing list >>>> >>> > ceph-users@xxxxxxxxxxxxxx >>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> > >>>> >> >>>> >> >>>> >> _______________________________________________ >>>> >> ceph-users mailing list >>>> >> ceph-users@xxxxxxxxxxxxxx >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > Cheers, >>>> > Brad >>>> > >>>> > >>>> >>>> >>>> >>>> -- >>>> Cheers, >>>> Brad >>> >>> >> > > > > -- > Cheers, > Brad -- Cheers, Brad _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com