Re: Cannot shutdown monitors

Michael Andersen <michael@xxxxxxxxxxxxx> · Fri, 10 Feb 2017 20:08:56 -0800

I believe I did shutdown mon process. Is that not done by the 

sudo systemctl stop ceph\*.service ceph\*.target

command? Also, as I noted, the mon process does not show up in ps after I do that, but I still get the shutdown halting.

The libceph kernel module may be installed. I did not do so deliberately but I used ceph-deploy so if it installs that then that is why it's there. I also run some kubernetes pods with rbd persistent volumes on these machines, although no rbd volumes are in use or mounted when I try shut down. In fact I unmapped all rbd volumes across the whole cluster to make sure. Is libceph required for rbd? 

But even so, is it normal for the libceph kernel module to prevent shutdown? Is there another stage in the shutdown procedure that I am missing?

On Feb 10, 2017 7:49 PM, "Brad Hubbard" <bhubbard@xxxxxxxxxx> wrote:
That looks like dmesg output from the libceph kernel module. Do you

have the libceph kernel module loaded?

If the answer to that question is "yes" the follow-up question is

"Why?" as it is not required for a MON or OSD host.

On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen <michael@xxxxxxxxxxxxx> wrote:

> Yeah, all three mons have OSDs on the same machines.

>

> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <skinjo@xxxxxxxxxx> wrote:

>>

>> Is your primary MON running on the host which some OSDs are running on?

>>

>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen

>> <michael@xxxxxxxxxxxxx> wrote:

>> > Hi

>> >

>> > I am running a small cluster of 8 machines (80 osds), with three

>> > monitors on

>> > Ubuntu 16.04. Ceph version 10.2.5.

>> >

>> > I cannot reboot the monitors without physically going into the

>> > datacenter

>> > and power cycling them. What happens is that while shutting down, ceph

>> > gets

>> > stuck trying to contact the other monitors but networking has already

>> > shut

>> > down or something like that. I get an endless stream of:

>> >

>> > libceph: connect 10.20.0.10:6789 error -101

>> > libceph: connect 10.20.0.13:6789 error -101

>> > libceph: connect 10.20.0.17:6789 error -101

>> >

>> > where in this case 10.20.0.10 is the machine I am trying to shut down

>> > and

>> > all three IPs are the MONs.

>> >

>> > At this stage of the shutdown, the machine doesn't respond to pings, and

>> > I

>> > cannot even log in on any of the virtual terminals. Nothing to do but

>> > poweroff at the server.

>> >

>> > The other non-mon servers shut down just fine, and the cluster was

>> > healthy

>> > at the time I was rebooting the mon (I only reboot one machine at a

>> > time,

>> > waiting for it to come up before I do the next one).

>> >

>> > Also worth mentioning that if I execute

>> >

>> > sudo systemctl stop ceph\*.service ceph\*.target

>> >

>> > on the server, the only things I see are:

>> >

>> > root     11143     2  0 18:40 ?        00:00:00 [ceph-msgr]

>> > root     11162     2  0 18:40 ?        00:00:00 [ceph-watch-noti]

>> >

>> > and even then, when no ceph daemons are left running, doing a reboot

>> > goes

>> > into the same loop.

>> >

>> > I can't really find any mention of this online, but I feel someone must

>> > have

>> > hit this. Any idea how to fix it? It's really annoying because its hard

>> > for

>> > me to get access to the datacenter.

>> >

>> > Thanks

>> > Michael

>> >

>> > _______________________________________________

>> > ceph-users mailing list

>> > ceph-users@xxxxxxxxxxxxxx

>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

--

Cheers,

Brad

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com