Re: Cannot shutdown monitors

Brad Hubbard <bhubbard@xxxxxxxxxx> · Sat, 11 Feb 2017 15:10:46 +1000

On Sat, Feb 11, 2017 at 2:58 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
> Just making sure the list sees this for those that are following.
>
> On Sat, Feb 11, 2017 at 2:49 PM, Michael Andersen <michael@xxxxxxxxxxxxx> wrote:
>> Right, so yes libceph is loaded
>>
>> root@compound-7:~# lsmod | egrep "ceph|rbd"
>> rbd                    69632  0
>> libceph               245760  1 rbd
>> libcrc32c              16384  3 xfs,raid456,libceph
>>
>> I stopped all the services and unloaded the modules
>>
>> root@compound-7:~# systemctl stop ceph\*.service ceph\*.target
>> root@compound-7:~# modprobe -r rbd
>> root@compound-7:~# modprobe -r libceph
>> root@compound-7:~# lsmod | egrep "ceph|rbd"
>>
>> Then rebooted
>> root@compound-7:~# reboot
>>
>> And sure enough the reboot happened OK.
>>
>> So that solves my immediate problem, I now know how to work around it
>> (thanks!), but I would love to work out how to not need this step. Any

Can you double-check that all rbd volumes are unmounted on this host
when shutting down? Maybe unmap them just for good measure.

I don't believe the libceph module should need to talk to the cluster
unless it has active connections at the time of shutdown.

>> further info I can give to help?
>>
>>
>>
>> On Fri, Feb 10, 2017 at 8:42 PM, Michael Andersen <michael@xxxxxxxxxxxxx>
>> wrote:
>>>
>>> Sorry this email arrived out of order. I will do the modprobe -r test
>>>
>>> On Fri, Feb 10, 2017 at 8:20 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
>>>>
>>>> On Sat, Feb 11, 2017 at 2:08 PM, Michael Andersen <michael@xxxxxxxxxxxxx>
>>>> wrote:
>>>> > I believe I did shutdown mon process. Is that not done by the
>>>> >
>>>> > sudo systemctl stop ceph\*.service ceph\*.target
>>>> >
>>>> > command? Also, as I noted, the mon process does not show up in ps after
>>>> > I do
>>>> > that, but I still get the shutdown halting.
>>>> >
>>>> > The libceph kernel module may be installed. I did not do so
>>>> > deliberately but
>>>> > I used ceph-deploy so if it installs that then that is why it's there.
>>>> > I
>>>> > also run some kubernetes pods with rbd persistent volumes on these
>>>> > machines,
>>>> > although no rbd volumes are in use or mounted when I try shut down. In
>>>> > fact
>>>> > I unmapped all rbd volumes across the whole cluster to make sure. Is
>>>> > libceph
>>>> > required for rbd?
>>>>
>>>> For kernel rbd (/dev/rbd0, etc.) yes, for librbd, no.
>>>>
>>>> As a test try modprobe -r on both the libceph and rbd modules before
>>>> shutdown and see if that helps ("modprobe -r rbd" should unload
>>>> libceph as well but verify that).
>>>>
>>>> >
>>>> > But even so, is it normal for the libceph kernel module to prevent
>>>> > shutdown?
>>>> > Is there another stage in the shutdown procedure that I am missing?
>>>> >
>>>> >
>>>> > On Feb 10, 2017 7:49 PM, "Brad Hubbard" <bhubbard@xxxxxxxxxx> wrote:
>>>> >
>>>> > That looks like dmesg output from the libceph kernel module. Do you
>>>> > have the libceph kernel module loaded?
>>>> >
>>>> > If the answer to that question is "yes" the follow-up question is
>>>> > "Why?" as it is not required for a MON or OSD host.
>>>> >
>>>> > On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen
>>>> > <michael@xxxxxxxxxxxxx>
>>>> > wrote:
>>>> >> Yeah, all three mons have OSDs on the same machines.
>>>> >>
>>>> >> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <skinjo@xxxxxxxxxx> wrote:
>>>> >>>
>>>> >>> Is your primary MON running on the host which some OSDs are running
>>>> >>> on?
>>>> >>>
>>>> >>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen
>>>> >>> <michael@xxxxxxxxxxxxx> wrote:
>>>> >>> > Hi
>>>> >>> >
>>>> >>> > I am running a small cluster of 8 machines (80 osds), with three
>>>> >>> > monitors on
>>>> >>> > Ubuntu 16.04. Ceph version 10.2.5.
>>>> >>> >
>>>> >>> > I cannot reboot the monitors without physically going into the
>>>> >>> > datacenter
>>>> >>> > and power cycling them. What happens is that while shutting down,
>>>> >>> > ceph
>>>> >>> > gets
>>>> >>> > stuck trying to contact the other monitors but networking has
>>>> >>> > already
>>>> >>> > shut
>>>> >>> > down or something like that. I get an endless stream of:
>>>> >>> >
>>>> >>> > libceph: connect 10.20.0.10:6789 error -101
>>>> >>> > libceph: connect 10.20.0.13:6789 error -101
>>>> >>> > libceph: connect 10.20.0.17:6789 error -101
>>>> >>> >
>>>> >>> > where in this case 10.20.0.10 is the machine I am trying to shut
>>>> >>> > down
>>>> >>> > and
>>>> >>> > all three IPs are the MONs.
>>>> >>> >
>>>> >>> > At this stage of the shutdown, the machine doesn't respond to
>>>> >>> > pings,
>>>> >>> > and
>>>> >>> > I
>>>> >>> > cannot even log in on any of the virtual terminals. Nothing to do
>>>> >>> > but
>>>> >>> > poweroff at the server.
>>>> >>> >
>>>> >>> > The other non-mon servers shut down just fine, and the cluster was
>>>> >>> > healthy
>>>> >>> > at the time I was rebooting the mon (I only reboot one machine at a
>>>> >>> > time,
>>>> >>> > waiting for it to come up before I do the next one).
>>>> >>> >
>>>> >>> > Also worth mentioning that if I execute
>>>> >>> >
>>>> >>> > sudo systemctl stop ceph\*.service ceph\*.target
>>>> >>> >
>>>> >>> > on the server, the only things I see are:
>>>> >>> >
>>>> >>> > root     11143     2  0 18:40 ?        00:00:00 [ceph-msgr]
>>>> >>> > root     11162     2  0 18:40 ?        00:00:00 [ceph-watch-noti]
>>>> >>> >
>>>> >>> > and even then, when no ceph daemons are left running, doing a
>>>> >>> > reboot
>>>> >>> > goes
>>>> >>> > into the same loop.
>>>> >>> >
>>>> >>> > I can't really find any mention of this online, but I feel someone
>>>> >>> > must
>>>> >>> > have
>>>> >>> > hit this. Any idea how to fix it? It's really annoying because its
>>>> >>> > hard
>>>> >>> > for
>>>> >>> > me to get access to the datacenter.
>>>> >>> >
>>>> >>> > Thanks
>>>> >>> > Michael
>>>> >>> >
>>>> >>> > _______________________________________________
>>>> >>> > ceph-users mailing list
>>>> >>> > ceph-users@xxxxxxxxxxxxxx
>>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >>> >
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> ceph-users mailing list
>>>> >> ceph-users@xxxxxxxxxxxxxx
>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Cheers,
>>>> > Brad
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Cheers,
>>>> Brad
>>>
>>>
>>
>
>
>
> --
> Cheers,
> Brad

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com