Re: Cannot shutdown monitors

Brad Hubbard <bhubbard@xxxxxxxxxx> · Sat, 11 Feb 2017 13:49:53 +1000

That looks like dmesg output from the libceph kernel module. Do you
have the libceph kernel module loaded?

If the answer to that question is "yes" the follow-up question is
"Why?" as it is not required for a MON or OSD host.

On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen <michael@xxxxxxxxxxxxx> wrote:
> Yeah, all three mons have OSDs on the same machines.
>
> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <skinjo@xxxxxxxxxx> wrote:
>>
>> Is your primary MON running on the host which some OSDs are running on?
>>
>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen
>> <michael@xxxxxxxxxxxxx> wrote:
>> > Hi
>> >
>> > I am running a small cluster of 8 machines (80 osds), with three
>> > monitors on
>> > Ubuntu 16.04. Ceph version 10.2.5.
>> >
>> > I cannot reboot the monitors without physically going into the
>> > datacenter
>> > and power cycling them. What happens is that while shutting down, ceph
>> > gets
>> > stuck trying to contact the other monitors but networking has already
>> > shut
>> > down or something like that. I get an endless stream of:
>> >
>> > libceph: connect 10.20.0.10:6789 error -101
>> > libceph: connect 10.20.0.13:6789 error -101
>> > libceph: connect 10.20.0.17:6789 error -101
>> >
>> > where in this case 10.20.0.10 is the machine I am trying to shut down
>> > and
>> > all three IPs are the MONs.
>> >
>> > At this stage of the shutdown, the machine doesn't respond to pings, and
>> > I
>> > cannot even log in on any of the virtual terminals. Nothing to do but
>> > poweroff at the server.
>> >
>> > The other non-mon servers shut down just fine, and the cluster was
>> > healthy
>> > at the time I was rebooting the mon (I only reboot one machine at a
>> > time,
>> > waiting for it to come up before I do the next one).
>> >
>> > Also worth mentioning that if I execute
>> >
>> > sudo systemctl stop ceph\*.service ceph\*.target
>> >
>> > on the server, the only things I see are:
>> >
>> > root     11143     2  0 18:40 ?        00:00:00 [ceph-msgr]
>> > root     11162     2  0 18:40 ?        00:00:00 [ceph-watch-noti]
>> >
>> > and even then, when no ceph daemons are left running, doing a reboot
>> > goes
>> > into the same loop.
>> >
>> > I can't really find any mention of this online, but I feel someone must
>> > have
>> > hit this. Any idea how to fix it? It's really annoying because its hard
>> > for
>> > me to get access to the datacenter.
>> >
>> > Thanks
>> > Michael
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com