Re: Issue with "renamed" mon, crashing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kamila,

 

Thank you for your response.

 

I think we solved it yesterday.

I simply removed the mon again and this time I also removed all references to it in ceph.conf (had some remnants there).

After that I ran ceph-deploy and after that it haven’t crashed again so far.

 

So in this case it was most likely some leftovers from the old mon in the config that fscked up things. (don’t get why though, but since it works after I removed all traces of it first and then recreated it). (before that I had removed it, recreated it a bunch of times aswell, but with some leftovers I ceph.conf, that was when it didn’t work)

 

//Anders

 

Från: Kamila Součková [mailto:kamila@xxxxxx]
Skickat: den 8 november 2017 13:43
Till: Anders Olausson <anders@xxxxxxxxxxxx>
Kopia: ceph-users@xxxxxxxxxxxxxx
Ämne: Re: [ceph-users] Issue with "renamed" mon, crashing

 

Hi,

 

I am not sure if this is the same issue as we had recently, but it looks a bit like it -- we also had a Luminous mon crashing right after syncing was done.

 

Turns out that the current release has a bug which causes the mon to crash if it cannot find a mgr daemon. This should be fixed in the upcoming release.

 

In our case we "solved" it by moving the active mgr to the mon's host. (I am not sure how to activate a specific mgr, but it appears that the mgrs get activated in FIFO order -- so just keep killing and re-starting the active one until a mgr on the mon's host is active).

 

Hope this helps!

 

Kamila

 

On Mon, Nov 6, 2017 at 12:44 PM Anders Olausson <anders@xxxxxxxxxxxx> wrote:

Hi,

 

I recently (yesterday) upgraded to Luminous (12.2.1) running on Ubuntu 14.04.5 LTS.

Upgrade went fine, no issues at all.

However when I was about to use ceph-deploy to configure some new disks it failed.

After some investigation I figured out that it didn’t like that my mons was named ceph03mon on the host ceph03 for example, ceph-deploy gatherkeys ceph03 failed.

So I decided to rename my mons. I started with removing one of them:

 

# stop ceph-mon id=ceph03mon

# ceph mon remove ceph03mon

# cd /var/lib/ceph/mon/

# mv ceph-ceph03mon disabled-ceph-ceph03mon

 

Created the new one:

 

# mkdir tmp

# mkdir ceph-ceph03

# ceph auth get mon. -o tmp/keyring

# ceph mon getmap -o tmp/monmap

# ceph-mon -i ceph03 --mkfs --monmap tmp/monmap --keyring tmp/keyring

# chown -R ceph:ceph ceph-ceph03

# ceph-mon -i ceph03 --public-addr 10.10.1.23:6789

# start ceph-mon id=ceph03

 

Starts OK, quorum is established, when it gets the command “ceph osd pool stat” for example, or “ceph auth list” it crashes.

 

Complete log can be found at: http://files.spacedump.se/ceph03-monerror-20171106-01.txt

Used below settings for logging in ceph.conf at the time:

 

[mon]

       debug mon = 20

       debug paxos = 20

       debug auth = 20

 

I have now rolled back to the old monitor, it works as it should, on the same box etc. But it’s the one upgraded from Hammer -> Jewel -> Luminous.

 

Any idea what the issue could be?

Thanks.

 

Best regards

  Anders Olausson

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux