Re: assertion error trying to start mds server

John Spray <jspray@xxxxxxxxxx> · Wed, 11 Oct 2017 12:07:25 +0100

On Wed, Oct 11, 2017 at 1:42 AM, Bill Sharer <bsharer@xxxxxxxxxxxxxx> wrote:
> I've been in the process of updating my gentoo based cluster both with
> new hardware and a somewhat postponed update.  This includes some major
> stuff including the switch from gcc 4.x to 5.4.0 on existing hardware
> and using gcc 6.4.0 to make better use of AMD Ryzen on the new
> hardware.  The existing cluster was on 10.2.2, but I was going to
> 10.2.7-r1 as an interim step before moving on to 12.2.0 to begin
> transitioning to bluestore on the osd's.
>
> The Ryzen units are slated to be bluestore based OSD servers if and when
> I get to that point.  Up until the mds failure, they were simply cephfs
> clients.  I had three OSD servers updated to 10.2.7-r1 (one is also a
> MON) and had two servers left to update.  Both of these are also MONs
> and were acting as a pair of dual active MDS servers running 10.2.2.
> Monday morning I found out the hard way that an UPS one of them was on
> has a dead battery.  After I fsck'd and came back up, I saw the
> following assertion error when it was trying to start it's mds.B server:
>
>
> ==== mdsbeacon(64162/B up:replay seq 3 v4699) v7 ==== 126+0+0 (709014160
> 0 0) 0x7f6fb4001bc0 con 0x55f94779d
> 8d0
>      0> 2017-10-09 11:43:06.935662 7f6fa9ffb700 -1 mds/journal.cc: In
> function 'virtual void EImportStart::r
> eplay(MDSRank*)' thread 7f6fa9ffb700 time 2017-10-09 11:43:06.934972
> mds/journal.cc: 2929: FAILED assert(mds->sessionmap.get_version() == cmapv)
>
>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x82) [0x55f93d64a122]
>  2: (EImportStart::replay(MDSRank*)+0x9ce) [0x55f93d52a5ce]
>  3: (MDLog::_replay_thread()+0x4f4) [0x55f93d4a8e34]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x55f93d25bd4d]
>  5: (()+0x74a4) [0x7f6fd009b4a4]
>  6: (clone()+0x6d) [0x7f6fce5a598d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 rbd_mirror
>    0/ 5 rbd_replay
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>    0/ 0 refs
>    1/ 5 xio
>    1/ 5 compressor
>    1/ 5 newstore
>    1/ 5 bluestore
>    1/ 5 bluefs
>    1/ 3 bdev
>    1/ 5 kstore
>    4/ 5 rocksdb
>    4/ 5 leveldb
>    1/ 5 kinetic
>    1/ 5 fuse
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-mds.B.log
>
>
>
> When I was googling around, I ran into this Cern presentation and tried
> out the offline backware scrubbing commands on slide 25 first:
>
> https://indico.cern.ch/event/531810/contributions/2309925/attachments/1357386/2053998/GoncaloBorges-HEPIX16-v3.pdf
>
>
> Both ran without any messages, so I'm assuming I have sane contents in
> the cephfs_data and cephfs_metadata pools.  Still no luck getting things
> restarted, so I tried the cephfs-journal-tool journal reset on slide
> 23.  That didn't work either.  Just for giggles, I tried setting up the
> two Ryzen boxes as new mds.C and mds.D servers which would run on
> 10.2.7-r1 instead of using mds.A and mds.B (10.2.2).  The D server fails
> with the same assert as follows:

Because this system was running multiple active MDSs on Jewel (based
on seeing an EImportStart journal entry), and that was known to be
unstable, I would advise you to blow away the filesystem and create a
fresh one using luminous (where multi-mds is stable), rather than
trying to debug it.  Going back to try and work out what went wrong
with Jewel code is probably not a very valuable activity unless you
have irreplacable data.

If you do want to get this filesystem back on its feet in-place:
(first stopping all MDSs) I'm guessing that your cephfs-journal-tool
reset didn't help because you had multiple MDS ranks, and that tool
just operates on rank 0 by default.  You need to work out which rank's
journal is actually damaged (it's part of the prefix to MDS log
messages), and then pass a --rank argument to cephfs-journal-tool.
You will also need to reset all the other ranks' journals to keep
things consistent, and then do a "ceph fs reset" so that it will start
up with a single MDS next time.  If you get the filesystem up and
running again, I'd still recommend copying anything important off it
and creating a new one using luminous, rather than continuing to run
with maybe-still-subtly-damaged metadata.

John

>
>
> === 132+0+1979520 (4198351460 0 1611007530) 0x7fffc4000a70 con
> 0x7fffe0013310
>      0> 2017-10-09 13:01:31.571195 7fffd99f5700 -1 mds/journal.cc: In
> function 'virtual void EImportStart::replay(MDSRank*)' thread
> 7fffd99f5700 time 2017-10-09 13:01:31.570608
> mds/journal.cc: 2949: FAILED assert(mds->sessionmap.get_version() == cmapv)

>
>  ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x80) [0x555555b7ebc8]
>  2: (EImportStart::replay(MDSRank*)+0x9ea) [0x555555a5674a]
>  3: (MDLog::_replay_thread()+0xe51) [0x5555559cef21]
>  4: (MDLog::ReplayThread::entry()+0xd) [0x5555557778cd]
>  5: (()+0x7364) [0x7ffff7bc5364]
>  6: (clone()+0x6d) [0x7ffff6051ccd]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com