Re: jewel upgrade : MON unable to start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I believe this is because I did not read the instruction thoroughly enough... this is my first "live upgrade" 


-----Message d'origine-----
De : Oleksandr Natalenko [mailto:oleksandr@xxxxxxxxxxxxxx] 
Envoyé : lundi 2 mai 2016 16:39
À : SCHAER Frederic <frederic.schaer@xxxxxx>; ceph-users@xxxxxxxx
Objet : Re:  jewel upgrade : MON unable to start

Why do you upgrade osds first if it is necessary to upgrade mons before everything else?

On May 2, 2016 5:31:43 PM GMT+03:00, SCHAER Frederic <frederic.schaer@xxxxxx> wrote:
>Hi,
>
>I'm < sort of > following the upgrade instructions on CentOS 7.2.
>I upgraded 3 OSD nodes without too many issues, even if I would rewrite
>those upgrade instructions to :
>
>
>#chrony has ID 167 on my systems... this was set at install time ! but
>I use NTP anyway.
>
>yum remove chrony
>
>sed -i -e '/chrony/d' /etc/passwd
>
>#there is no more "service ceph stop" possible after the yum update, so
>I had to run it before. Or killall ceph daemons...
>
>service ceph stop
>
>yum -y update
>
>chown ceph:ceph /var/lib/ceph
>
>#this fixed some OSD wich failed to start because of permission denied
>issues on the journals.
>
>chown -RL --dereference ceph:ceph /var/lib/ceph
>
>#not done automatically :
>
>systemctl enable ceph-osd.target ceph.target
>
>#systemctl start ceph-osd.target has absolutely no effect. Nor any
>.target targets, at least for me, and right after the upgrade.
>
>ceph-disk activate-all
>
>Anyways. Now I'm trying to upgrade the MON nodes... and I'm facing an
>issue.
>I started with one MON and left the 2 others untouched (hammer).
>
>First, the mons did not want to start :
>May 02 15:40:58 ceph2_snip_ ceph-mon[789124]: warning: unable to create
>/var/run/ceph: (13) Permission denied
>
>No, pb: I created and chowned the directory.
>But I'm now still unable to start this MON, journalctl tells me :
>
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: starting mon.ceph2 rank 2
>at _ipsnip_.72:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph2 fsid
>70ac4a78-46c0-45e6-8ff9-878b37f50fa1
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: In function
>'void FSMap::sanity() const' thread 7f774d7d94c0 time 2016-05-02
>16:05:49.487984
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: 607: FAILED
>assert(i.second.state == MDSMap::STATE_STANDBY)
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: ceph version 10.2.0
>(3a9fba20ec743699b69bd0181dd6c54dc01c64b9)
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 1:
>(ceph::__ceph_assert_fail(char const*, char const*, int, char
>const*)+0x85) [0x7f774de221e5]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 2: (FSMap::sanity()
>const+0x952) [0x7f774dd3f972]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 3:
>(MDSMonitor::update_from_paxos(bool*)+0x490) [0x7f774db5cba0]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 4:
>(PaxosService::refresh(bool*)+0x1a5) [0x7f774dacdda5]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 5:
>(Monitor::refresh_from_paxos(bool*)+0x15b) [0x7f774da674bb]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 6:
>(Monitor::init_paxos()+0x95) [0x7f774da67955]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 7:
>(Monitor::preinit()+0x949) [0x7f774da77b39]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 8: (main()+0x23e3)
>[0x7f774da03e93]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 9:
>(__libc_start_main()+0xf5) [0x7f774ad6fb15]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 10: (()+0x25e401)
>[0x7f774da57401]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: NOTE: a copy of the
>executable, or `objdump -rdS <executable>` is needed to interpret this.
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 2016-05-02 16:05:49.490966
>7f774d7d94c0 -1 mds/FSMap.cc: In function 'void FSMap::sanity() const'
>thread 7f774d7d94c0 time 2016-05-02 16:05:49.487984
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: 607: FAILED
>assert(i.second.state == MDSMap::STATE_STANDBY)
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: ceph version 10.2.0
>(3a9fba20ec743699b69bd0181dd6c54dc01c64b9)
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 1:
>(ceph::__ceph_assert_fail(char const*, char const*, int, char
>const*)+0x85) [0x7f774de221e5]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 2: (FSMap::sanity()
>const+0x952) [0x7f774dd3f972]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 3:
>(MDSMonitor::update_from_paxos(bool*)+0x490) [0x7f774db5cba0]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 4:
>(PaxosService::refresh(bool*)+0x1a5) [0x7f774dacdda5]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 5:
>(Monitor::refresh_from_paxos(bool*)+0x15b) [0x7f774da674bb]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 6:
>(Monitor::init_paxos()+0x95) [0x7f774da67955]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 7:
>(Monitor::preinit()+0x949) [0x7f774da77b39]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 8: (main()+0x23e3)
>[0x7f774da03e93]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 9:
>(__libc_start_main()+0xf5) [0x7f774ad6fb15]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 10: (()+0x25e401)
>[0x7f774da57401]
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: NOTE: a copy of the
>executable, or `objdump -rdS <executable>` is needed to interpret this.
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: 0> 2016-05-02
>16:05:49.490966 7f774d7d94c0 -1 mds/FSMap.cc: In function 'void
>FSMap::sanity() const' thread 7f774d7d94c0 time 2016-05-02
>16:05:49.487984
>May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: 607: FAILED
>assert(i.second.state == MDSMap::STATE_STANDBY)
>(...)
>
>
>?  I'm now stuck with a half jewel/hammer cluster... DOH ! :'(
>
>I've seen a bug on the bugtracker, but I fail to find a work around ?
>http://tracker.ceph.com/issues/15591
>
>I stopped the jewel MDS on that node : this did not fix anything
>I removed the MDS from the config : no fix
>I stopped all MDS everywhere (I'm not using cephfs for now) : no fix.
>
>Should I go ahead and upgrade everything in the hope that it is
>required to upgrade and stop all the MONs simultaneously, and ONLY THEN
>restart them in order to get the MONs to correctly start up ?
>The upgrade instructions seem to imply that all MONs should be
>restarted, but I feel it might lead me to a disaster right now... ?
>Strangely, the remaining hammer MONs do see the jewel MDS if I start
>it... ?
>
>Any advice ?
>
>Regards
>
>
>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>ceph-users mailing list
>ceph-users@xxxxxxxxxxxxxx
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux