Re: assertion error trying to start mds server

Bill Sharer <bsharer@xxxxxxxxxxxxxx> · Wed, 11 Oct 2017 19:23:10 -0400

I was wondering if I can't get the second mds back up.... That offline
backward scrub check sounds like it should be able to also salvage what
it can of the two pools to a normal filesystem.  Is there an option for
that or has someone written some form of salvage tool?

On 10/11/2017 07:07 AM, John Spray wrote:
> On Wed, Oct 11, 2017 at 1:42 AM, Bill Sharer <bsharer@xxxxxxxxxxxxxx> wrote:
>> I've been in the process of updating my gentoo based cluster both with
>> new hardware and a somewhat postponed update.  This includes some major
>> stuff including the switch from gcc 4.x to 5.4.0 on existing hardware
>> and using gcc 6.4.0 to make better use of AMD Ryzen on the new
>> hardware.  The existing cluster was on 10.2.2, but I was going to
>> 10.2.7-r1 as an interim step before moving on to 12.2.0 to begin
>> transitioning to bluestore on the osd's.
>>
>> The Ryzen units are slated to be bluestore based OSD servers if and when
>> I get to that point.  Up until the mds failure, they were simply cephfs
>> clients.  I had three OSD servers updated to 10.2.7-r1 (one is also a
>> MON) and had two servers left to update.  Both of these are also MONs
>> and were acting as a pair of dual active MDS servers running 10.2.2.
>> Monday morning I found out the hard way that an UPS one of them was on
>> has a dead battery.  After I fsck'd and came back up, I saw the
>> following assertion error when it was trying to start it's mds.B server:
>>
>>
>> ==== mdsbeacon(64162/B up:replay seq 3 v4699) v7 ==== 126+0+0 (709014160
>> 0 0) 0x7f6fb4001bc0 con 0x55f94779d
>> 8d0
>>      0> 2017-10-09 11:43:06.935662 7f6fa9ffb700 -1 mds/journal.cc: In
>> function 'virtual void EImportStart::r
>> eplay(MDSRank*)' thread 7f6fa9ffb700 time 2017-10-09 11:43:06.934972
>> mds/journal.cc: 2929: FAILED assert(mds->sessionmap.get_version() == cmapv)
>>
>>  ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x82) [0x55f93d64a122]
>>  2: (EImportStart::replay(MDSRank*)+0x9ce) [0x55f93d52a5ce]
>>  3: (MDLog::_replay_thread()+0x4f4) [0x55f93d4a8e34]
>>  4: (MDLog::ReplayThread::entry()+0xd) [0x55f93d25bd4d]
>>  5: (()+0x74a4) [0x7f6fd009b4a4]
>>  6: (clone()+0x6d) [0x7f6fce5a598d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> --- logging levels ---
>>    0/ 5 none
>>    0/ 1 lockdep
>>    0/ 1 context
>>    1/ 1 crush
>>    1/ 5 mds
>>    1/ 5 mds_balancer
>>    1/ 5 mds_locker
>>    1/ 5 mds_log
>>    1/ 5 mds_log_expire
>>    1/ 5 mds_migrator
>>    0/ 1 buffer
>>    0/ 1 timer
>>    0/ 1 filer
>>    0/ 1 striper
>>    0/ 1 objecter
>>    0/ 5 rados
>>    0/ 5 rbd
>>    0/ 5 rbd_mirror
>>    0/ 5 rbd_replay
>>    0/ 5 journaler
>>    0/ 5 objectcacher
>>    0/ 5 client
>>    0/ 5 osd
>>    0/ 5 optracker
>>    0/ 5 objclass
>>    1/ 3 filestore
>>    1/ 3 journal
>>    0/ 5 ms
>>    1/ 5 mon
>>    0/10 monc
>>    1/ 5 paxos
>>    0/ 5 tp
>>    1/ 5 auth
>>    1/ 5 crypto
>>    1/ 1 finisher
>>    1/ 5 heartbeatmap
>>    1/ 5 perfcounter
>>    1/ 5 rgw
>>    1/10 civetweb
>>    1/ 5 javaclient
>>    1/ 5 asok
>>    1/ 1 throttle
>>    0/ 0 refs
>>    1/ 5 xio
>>    1/ 5 compressor
>>    1/ 5 newstore
>>    1/ 5 bluestore
>>    1/ 5 bluefs
>>    1/ 3 bdev
>>    1/ 5 kstore
>>    4/ 5 rocksdb
>>    4/ 5 leveldb
>>    1/ 5 kinetic
>>    1/ 5 fuse
>>   -2/-2 (syslog threshold)
>>   -1/-1 (stderr threshold)
>>   max_recent     10000
>>   max_new         1000
>>   log_file /var/log/ceph/ceph-mds.B.log
>>
>>
>>
>> When I was googling around, I ran into this Cern presentation and tried
>> out the offline backware scrubbing commands on slide 25 first:
>>
>> https://indico.cern.ch/event/531810/contributions/2309925/attachments/1357386/2053998/GoncaloBorges-HEPIX16-v3.pdf
>>
>>
>> Both ran without any messages, so I'm assuming I have sane contents in
>> the cephfs_data and cephfs_metadata pools.  Still no luck getting things
>> restarted, so I tried the cephfs-journal-tool journal reset on slide
>> 23.  That didn't work either.  Just for giggles, I tried setting up the
>> two Ryzen boxes as new mds.C and mds.D servers which would run on
>> 10.2.7-r1 instead of using mds.A and mds.B (10.2.2).  The D server fails
>> with the same assert as follows:
>
> Because this system was running multiple active MDSs on Jewel (based
> on seeing an EImportStart journal entry), and that was known to be
> unstable, I would advise you to blow away the filesystem and create a
> fresh one using luminous (where multi-mds is stable), rather than
> trying to debug it.  Going back to try and work out what went wrong
> with Jewel code is probably not a very valuable activity unless you
> have irreplacable data.
>
> If you do want to get this filesystem back on its feet in-place:
> (first stopping all MDSs) I'm guessing that your cephfs-journal-tool
> reset didn't help because you had multiple MDS ranks, and that tool
> just operates on rank 0 by default.  You need to work out which rank's
> journal is actually damaged (it's part of the prefix to MDS log
> messages), and then pass a --rank argument to cephfs-journal-tool.
> You will also need to reset all the other ranks' journals to keep
> things consistent, and then do a "ceph fs reset" so that it will start
> up with a single MDS next time.  If you get the filesystem up and
> running again, I'd still recommend copying anything important off it
> and creating a new one using luminous, rather than continuing to run
> with maybe-still-subtly-damaged metadata.
>
> John
>
>>
>> === 132+0+1979520 (4198351460 0 1611007530) 0x7fffc4000a70 con
>> 0x7fffe0013310
>>      0> 2017-10-09 13:01:31.571195 7fffd99f5700 -1 mds/journal.cc: In
>> function 'virtual void EImportStart::replay(MDSRank*)' thread
>> 7fffd99f5700 time 2017-10-09 13:01:31.570608
>> mds/journal.cc: 2949: FAILED assert(mds->sessionmap.get_version() == cmapv)
>>  ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x80) [0x555555b7ebc8]
>>  2: (EImportStart::replay(MDSRank*)+0x9ea) [0x555555a5674a]
>>  3: (MDLog::_replay_thread()+0xe51) [0x5555559cef21]
>>  4: (MDLog::ReplayThread::entry()+0xd) [0x5555557778cd]
>>  5: (()+0x7364) [0x7ffff7bc5364]
>>  6: (clone()+0x6d) [0x7ffff6051ccd]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com