Re: Cannot shutdown monitors

Shinobu Kinjo <skinjo@xxxxxxxxxx> · Sat, 11 Feb 2017 12:47:23 +0900

You may need to stop MON process (if it's acting as primary).

And once you make sure that all OSDs sessions are moved to another
MON, you would be able to shutdown physical host.

Have you tried that?

On Sat, Feb 11, 2017 at 12:18 PM, Michael Andersen
<michael@xxxxxxxxxxxxx> wrote:
> Yeah, all three mons have OSDs on the same machines.
>
> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <skinjo@xxxxxxxxxx> wrote:
>>
>> Is your primary MON running on the host which some OSDs are running on?
>>
>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen
>> <michael@xxxxxxxxxxxxx> wrote:
>> > Hi
>> >
>> > I am running a small cluster of 8 machines (80 osds), with three
>> > monitors on
>> > Ubuntu 16.04. Ceph version 10.2.5.
>> >
>> > I cannot reboot the monitors without physically going into the
>> > datacenter
>> > and power cycling them. What happens is that while shutting down, ceph
>> > gets
>> > stuck trying to contact the other monitors but networking has already
>> > shut
>> > down or something like that. I get an endless stream of:
>> >
>> > libceph: connect 10.20.0.10:6789 error -101
>> > libceph: connect 10.20.0.13:6789 error -101
>> > libceph: connect 10.20.0.17:6789 error -101
>> >
>> > where in this case 10.20.0.10 is the machine I am trying to shut down
>> > and
>> > all three IPs are the MONs.
>> >
>> > At this stage of the shutdown, the machine doesn't respond to pings, and
>> > I
>> > cannot even log in on any of the virtual terminals. Nothing to do but
>> > poweroff at the server.
>> >
>> > The other non-mon servers shut down just fine, and the cluster was
>> > healthy
>> > at the time I was rebooting the mon (I only reboot one machine at a
>> > time,
>> > waiting for it to come up before I do the next one).
>> >
>> > Also worth mentioning that if I execute
>> >
>> > sudo systemctl stop ceph\*.service ceph\*.target
>> >
>> > on the server, the only things I see are:
>> >
>> > root     11143     2  0 18:40 ?        00:00:00 [ceph-msgr]
>> > root     11162     2  0 18:40 ?        00:00:00 [ceph-watch-noti]
>> >
>> > and even then, when no ceph daemons are left running, doing a reboot
>> > goes
>> > into the same loop.
>> >
>> > I can't really find any mention of this online, but I feel someone must
>> > have
>> > hit this. Any idea how to fix it? It's really annoying because its hard
>> > for
>> > me to get access to the datacenter.
>> >
>> > Thanks
>> > Michael
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com