I believe this is because I did not read the instruction thoroughly enough... this is my first "live upgrade" -----Message d'origine----- De : Oleksandr Natalenko [mailto:oleksandr@xxxxxxxxxxxxxx] Envoyé : lundi 2 mai 2016 16:39 À : SCHAER Frederic <frederic.schaer@xxxxxx>; ceph-users@xxxxxxxx Objet : Re: jewel upgrade : MON unable to start Why do you upgrade osds first if it is necessary to upgrade mons before everything else? On May 2, 2016 5:31:43 PM GMT+03:00, SCHAER Frederic <frederic.schaer@xxxxxx> wrote: >Hi, > >I'm < sort of > following the upgrade instructions on CentOS 7.2. >I upgraded 3 OSD nodes without too many issues, even if I would rewrite >those upgrade instructions to : > > >#chrony has ID 167 on my systems... this was set at install time ! but >I use NTP anyway. > >yum remove chrony > >sed -i -e '/chrony/d' /etc/passwd > >#there is no more "service ceph stop" possible after the yum update, so >I had to run it before. Or killall ceph daemons... > >service ceph stop > >yum -y update > >chown ceph:ceph /var/lib/ceph > >#this fixed some OSD wich failed to start because of permission denied >issues on the journals. > >chown -RL --dereference ceph:ceph /var/lib/ceph > >#not done automatically : > >systemctl enable ceph-osd.target ceph.target > >#systemctl start ceph-osd.target has absolutely no effect. Nor any >.target targets, at least for me, and right after the upgrade. > >ceph-disk activate-all > >Anyways. Now I'm trying to upgrade the MON nodes... and I'm facing an >issue. >I started with one MON and left the 2 others untouched (hammer). > >First, the mons did not want to start : >May 02 15:40:58 ceph2_snip_ ceph-mon[789124]: warning: unable to create >/var/run/ceph: (13) Permission denied > >No, pb: I created and chowned the directory. >But I'm now still unable to start this MON, journalctl tells me : > >May 02 16:05:49 ceph2_snip ceph-mon[804583]: starting mon.ceph2 rank 2 >at _ipsnip_.72:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph2 fsid >70ac4a78-46c0-45e6-8ff9-878b37f50fa1 >May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: In function >'void FSMap::sanity() const' thread 7f774d7d94c0 time 2016-05-02 >16:05:49.487984 >May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: 607: FAILED >assert(i.second.state == MDSMap::STATE_STANDBY) >May 02 16:05:49 ceph2_snip ceph-mon[804583]: ceph version 10.2.0 >(3a9fba20ec743699b69bd0181dd6c54dc01c64b9) >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 1: >(ceph::__ceph_assert_fail(char const*, char const*, int, char >const*)+0x85) [0x7f774de221e5] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 2: (FSMap::sanity() >const+0x952) [0x7f774dd3f972] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 3: >(MDSMonitor::update_from_paxos(bool*)+0x490) [0x7f774db5cba0] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 4: >(PaxosService::refresh(bool*)+0x1a5) [0x7f774dacdda5] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 5: >(Monitor::refresh_from_paxos(bool*)+0x15b) [0x7f774da674bb] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 6: >(Monitor::init_paxos()+0x95) [0x7f774da67955] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 7: >(Monitor::preinit()+0x949) [0x7f774da77b39] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 8: (main()+0x23e3) >[0x7f774da03e93] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 9: >(__libc_start_main()+0xf5) [0x7f774ad6fb15] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 10: (()+0x25e401) >[0x7f774da57401] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: NOTE: a copy of the >executable, or `objdump -rdS <executable>` is needed to interpret this. >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 2016-05-02 16:05:49.490966 >7f774d7d94c0 -1 mds/FSMap.cc: In function 'void FSMap::sanity() const' >thread 7f774d7d94c0 time 2016-05-02 16:05:49.487984 >May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: 607: FAILED >assert(i.second.state == MDSMap::STATE_STANDBY) >May 02 16:05:49 ceph2_snip ceph-mon[804583]: ceph version 10.2.0 >(3a9fba20ec743699b69bd0181dd6c54dc01c64b9) >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 1: >(ceph::__ceph_assert_fail(char const*, char const*, int, char >const*)+0x85) [0x7f774de221e5] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 2: (FSMap::sanity() >const+0x952) [0x7f774dd3f972] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 3: >(MDSMonitor::update_from_paxos(bool*)+0x490) [0x7f774db5cba0] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 4: >(PaxosService::refresh(bool*)+0x1a5) [0x7f774dacdda5] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 5: >(Monitor::refresh_from_paxos(bool*)+0x15b) [0x7f774da674bb] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 6: >(Monitor::init_paxos()+0x95) [0x7f774da67955] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 7: >(Monitor::preinit()+0x949) [0x7f774da77b39] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 8: (main()+0x23e3) >[0x7f774da03e93] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 9: >(__libc_start_main()+0xf5) [0x7f774ad6fb15] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 10: (()+0x25e401) >[0x7f774da57401] >May 02 16:05:49 ceph2_snip ceph-mon[804583]: NOTE: a copy of the >executable, or `objdump -rdS <executable>` is needed to interpret this. >May 02 16:05:49 ceph2_snip ceph-mon[804583]: 0> 2016-05-02 >16:05:49.490966 7f774d7d94c0 -1 mds/FSMap.cc: In function 'void >FSMap::sanity() const' thread 7f774d7d94c0 time 2016-05-02 >16:05:49.487984 >May 02 16:05:49 ceph2_snip ceph-mon[804583]: mds/FSMap.cc: 607: FAILED >assert(i.second.state == MDSMap::STATE_STANDBY) >(...) > > >? I'm now stuck with a half jewel/hammer cluster... DOH ! :'( > >I've seen a bug on the bugtracker, but I fail to find a work around ? >http://tracker.ceph.com/issues/15591 > >I stopped the jewel MDS on that node : this did not fix anything >I removed the MDS from the config : no fix >I stopped all MDS everywhere (I'm not using cephfs for now) : no fix. > >Should I go ahead and upgrade everything in the hope that it is >required to upgrade and stop all the MONs simultaneously, and ONLY THEN >restart them in order to get the MONs to correctly start up ? >The upgrade instructions seem to imply that all MONs should be >restarted, but I feel it might lead me to a disaster right now... ? >Strangely, the remaining hammer MONs do see the jewel MDS if I start >it... ? > >Any advice ? > >Regards > > > > > >------------------------------------------------------------------------ > >_______________________________________________ >ceph-users mailing list >ceph-users@xxxxxxxxxxxxxx >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com