I was wondering if I can't get the second mds back up.... That offline backward scrub check sounds like it should be able to also salvage what it can of the two pools to a normal filesystem. Is there an option for that or has someone written some form of salvage tool? On 10/11/2017 07:07 AM, John Spray wrote: > On Wed, Oct 11, 2017 at 1:42 AM, Bill Sharer <bsharer@xxxxxxxxxxxxxx> wrote: >> I've been in the process of updating my gentoo based cluster both with >> new hardware and a somewhat postponed update. This includes some major >> stuff including the switch from gcc 4.x to 5.4.0 on existing hardware >> and using gcc 6.4.0 to make better use of AMD Ryzen on the new >> hardware. The existing cluster was on 10.2.2, but I was going to >> 10.2.7-r1 as an interim step before moving on to 12.2.0 to begin >> transitioning to bluestore on the osd's. >> >> The Ryzen units are slated to be bluestore based OSD servers if and when >> I get to that point. Up until the mds failure, they were simply cephfs >> clients. I had three OSD servers updated to 10.2.7-r1 (one is also a >> MON) and had two servers left to update. Both of these are also MONs >> and were acting as a pair of dual active MDS servers running 10.2.2. >> Monday morning I found out the hard way that an UPS one of them was on >> has a dead battery. After I fsck'd and came back up, I saw the >> following assertion error when it was trying to start it's mds.B server: >> >> >> ==== mdsbeacon(64162/B up:replay seq 3 v4699) v7 ==== 126+0+0 (709014160 >> 0 0) 0x7f6fb4001bc0 con 0x55f94779d >> 8d0 >> 0> 2017-10-09 11:43:06.935662 7f6fa9ffb700 -1 mds/journal.cc: In >> function 'virtual void EImportStart::r >> eplay(MDSRank*)' thread 7f6fa9ffb700 time 2017-10-09 11:43:06.934972 >> mds/journal.cc: 2929: FAILED assert(mds->sessionmap.get_version() == cmapv) >> >> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x82) [0x55f93d64a122] >> 2: (EImportStart::replay(MDSRank*)+0x9ce) [0x55f93d52a5ce] >> 3: (MDLog::_replay_thread()+0x4f4) [0x55f93d4a8e34] >> 4: (MDLog::ReplayThread::entry()+0xd) [0x55f93d25bd4d] >> 5: (()+0x74a4) [0x7f6fd009b4a4] >> 6: (clone()+0x6d) [0x7f6fce5a598d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> --- logging levels --- >> 0/ 5 none >> 0/ 1 lockdep >> 0/ 1 context >> 1/ 1 crush >> 1/ 5 mds >> 1/ 5 mds_balancer >> 1/ 5 mds_locker >> 1/ 5 mds_log >> 1/ 5 mds_log_expire >> 1/ 5 mds_migrator >> 0/ 1 buffer >> 0/ 1 timer >> 0/ 1 filer >> 0/ 1 striper >> 0/ 1 objecter >> 0/ 5 rados >> 0/ 5 rbd >> 0/ 5 rbd_mirror >> 0/ 5 rbd_replay >> 0/ 5 journaler >> 0/ 5 objectcacher >> 0/ 5 client >> 0/ 5 osd >> 0/ 5 optracker >> 0/ 5 objclass >> 1/ 3 filestore >> 1/ 3 journal >> 0/ 5 ms >> 1/ 5 mon >> 0/10 monc >> 1/ 5 paxos >> 0/ 5 tp >> 1/ 5 auth >> 1/ 5 crypto >> 1/ 1 finisher >> 1/ 5 heartbeatmap >> 1/ 5 perfcounter >> 1/ 5 rgw >> 1/10 civetweb >> 1/ 5 javaclient >> 1/ 5 asok >> 1/ 1 throttle >> 0/ 0 refs >> 1/ 5 xio >> 1/ 5 compressor >> 1/ 5 newstore >> 1/ 5 bluestore >> 1/ 5 bluefs >> 1/ 3 bdev >> 1/ 5 kstore >> 4/ 5 rocksdb >> 4/ 5 leveldb >> 1/ 5 kinetic >> 1/ 5 fuse >> -2/-2 (syslog threshold) >> -1/-1 (stderr threshold) >> max_recent 10000 >> max_new 1000 >> log_file /var/log/ceph/ceph-mds.B.log >> >> >> >> When I was googling around, I ran into this Cern presentation and tried >> out the offline backware scrubbing commands on slide 25 first: >> >> https://indico.cern.ch/event/531810/contributions/2309925/attachments/1357386/2053998/GoncaloBorges-HEPIX16-v3.pdf >> >> >> Both ran without any messages, so I'm assuming I have sane contents in >> the cephfs_data and cephfs_metadata pools. Still no luck getting things >> restarted, so I tried the cephfs-journal-tool journal reset on slide >> 23. That didn't work either. Just for giggles, I tried setting up the >> two Ryzen boxes as new mds.C and mds.D servers which would run on >> 10.2.7-r1 instead of using mds.A and mds.B (10.2.2). The D server fails >> with the same assert as follows: > > Because this system was running multiple active MDSs on Jewel (based > on seeing an EImportStart journal entry), and that was known to be > unstable, I would advise you to blow away the filesystem and create a > fresh one using luminous (where multi-mds is stable), rather than > trying to debug it. Going back to try and work out what went wrong > with Jewel code is probably not a very valuable activity unless you > have irreplacable data. > > If you do want to get this filesystem back on its feet in-place: > (first stopping all MDSs) I'm guessing that your cephfs-journal-tool > reset didn't help because you had multiple MDS ranks, and that tool > just operates on rank 0 by default. You need to work out which rank's > journal is actually damaged (it's part of the prefix to MDS log > messages), and then pass a --rank argument to cephfs-journal-tool. > You will also need to reset all the other ranks' journals to keep > things consistent, and then do a "ceph fs reset" so that it will start > up with a single MDS next time. If you get the filesystem up and > running again, I'd still recommend copying anything important off it > and creating a new one using luminous, rather than continuing to run > with maybe-still-subtly-damaged metadata. > > John > >> >> === 132+0+1979520 (4198351460 0 1611007530) 0x7fffc4000a70 con >> 0x7fffe0013310 >> 0> 2017-10-09 13:01:31.571195 7fffd99f5700 -1 mds/journal.cc: In >> function 'virtual void EImportStart::replay(MDSRank*)' thread >> 7fffd99f5700 time 2017-10-09 13:01:31.570608 >> mds/journal.cc: 2949: FAILED assert(mds->sessionmap.get_version() == cmapv) >> ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x80) [0x555555b7ebc8] >> 2: (EImportStart::replay(MDSRank*)+0x9ea) [0x555555a5674a] >> 3: (MDLog::_replay_thread()+0xe51) [0x5555559cef21] >> 4: (MDLog::ReplayThread::entry()+0xd) [0x5555557778cd] >> 5: (()+0x7364) [0x7ffff7bc5364] >> 6: (clone()+0x6d) [0x7ffff6051ccd] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com