jewel upgrade : MON unable to start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

I’m « sort of » following the upgrade instructions on CentOS 7.2.

I upgraded 3 OSD nodes without too many issues, even if I would rewrite those upgrade instructions to :

 

#chrony has ID 167 on my systems... this was set at install time ! but I use NTP anyway.

yum remove chrony

sed -i -e '/chrony/d' /etc/passwd

#there is no more “service ceph stop” possible after the yum update, so I had to run it before. Or killall ceph daemons…

service ceph stop

yum -y update

chown ceph:ceph /var/lib/ceph

#this fixed some OSD wich failed to start because of permission denied issues on the journals.

chown -RL --dereference ceph:ceph /var/lib/ceph

#not done automatically :

systemctl enable ceph-osd.target ceph.target

#systemctl start ceph-osd.target has absolutely no effect. Nor any .target targets, at least for me, and right after the upgrade.

ceph-disk activate-all

 

Anyways. Now I’m trying to upgrade the MON nodes… and I’m facing an issue.

I started with one MON and left the 2 others untouched (hammer).

 

First, the mons did not want to start :

May 02 15:40:58 ceph2_snip_ ceph-mon[789124]: warning: unable to create /var/run/ceph: (13) Permission denied

 

No, pb: I created and chowned the directory.

But I’m now still unable to start this MON, journalctl tells me :

 

May 02 16:05:49 ceph2_snip ceph-mon[804583]: starting mon.ceph2 rank 2 at _ipsnip_.72:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph2 fsid 70ac4a78-46c0-45e6-8ff9-878b37f50fa1

May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: In function 'void FSMap::sanity() const' thread 7f774d7d94c0 time 2016-05-02 16:05:49.487984

May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: 607: FAILED assert(i.second.state == MDSMap::STATE_STANDBY)

May 02 16:05:49 ceph2_snip ceph-mon[804583]: ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f774de221e5]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 2: (FSMap::sanity() const+0x952) [0x7f774dd3f972]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 3: (MDSMonitor::update_from_paxos(bool*)+0x490) [0x7f774db5cba0]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 4: (PaxosService::refresh(bool*)+0x1a5) [0x7f774dacdda5]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 5: (Monitor::refresh_from_paxos(bool*)+0x15b) [0x7f774da674bb]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 6: (Monitor::init_paxos()+0x95) [0x7f774da67955]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 7: (Monitor::preinit()+0x949) [0x7f774da77b39]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 8: (main()+0x23e3) [0x7f774da03e93]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 9: (__libc_start_main()+0xf5) [0x7f774ad6fb15]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 10: (()+0x25e401) [0x7f774da57401]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 2016-05-02 16:05:49.490966 7f774d7d94c0 -1 mds/FSMap.cc: In function 'void FSMap::sanity() const' thread 7f774d7d94c0 time 2016-05-02 16:05:49.487984

May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: 607: FAILED assert(i.second.state == MDSMap::STATE_STANDBY)

May 02 16:05:49 ceph2_snip ceph-mon[804583]: ceph version 10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f774de221e5]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 2: (FSMap::sanity() const+0x952) [0x7f774dd3f972]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 3: (MDSMonitor::update_from_paxos(bool*)+0x490) [0x7f774db5cba0]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 4: (PaxosService::refresh(bool*)+0x1a5) [0x7f774dacdda5]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 5: (Monitor::refresh_from_paxos(bool*)+0x15b) [0x7f774da674bb]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 6: (Monitor::init_paxos()+0x95) [0x7f774da67955]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 7: (Monitor::preinit()+0x949) [0x7f774da77b39]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 8: (main()+0x23e3) [0x7f774da03e93]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 9: (__libc_start_main()+0xf5) [0x7f774ad6fb15]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 10: (()+0x25e401) [0x7f774da57401]

May 02 16:05:49 ceph2_snip ceph-mon[804583]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

May 02 16:05:49 ceph2_snip ceph-mon[804583]: 0> 2016-05-02 16:05:49.490966 7f774d7d94c0 -1 mds/FSMap.cc: In function 'void FSMap::sanity() const' thread 7f774d7d94c0 time 2016-05-02 16:05:49.487984

May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: 607: FAILED assert(i.second.state == MDSMap::STATE_STANDBY)

(…)

 

ð  I’m now stuck with a half jewel/hammer cluster… DOH ! :’(

 

I’ve seen a bug on the bugtracker, but I fail to find a work around ?

http://tracker.ceph.com/issues/15591

 

I stopped the jewel MDS on that node : this did not fix anything

I removed the MDS from the config : no fix

I stopped all MDS everywhere (I’m not using cephfs for now) : no fix.

 

Should I go ahead and upgrade everything in the hope that it is required to upgrade and stop all the MONs simultaneously, and ONLY THEN restart them in order to get the MONs to correctly start up ?

The upgrade instructions seem to imply that all MONs should be restarted, but I feel it might lead me to a disaster right now… ?

Strangely, the remaining hammer MONs do see the jewel MDS if I start it… ?

 

Any advice ?

 

Regards

 

 

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux