On Wed, Oct 11, 2017 at 1:42 AM, Bill Sharer <bsharer@xxxxxxxxxxxxxx> wrote: > I've been in the process of updating my gentoo based cluster both with > new hardware and a somewhat postponed update. This includes some major > stuff including the switch from gcc 4.x to 5.4.0 on existing hardware > and using gcc 6.4.0 to make better use of AMD Ryzen on the new > hardware. The existing cluster was on 10.2.2, but I was going to > 10.2.7-r1 as an interim step before moving on to 12.2.0 to begin > transitioning to bluestore on the osd's. > > The Ryzen units are slated to be bluestore based OSD servers if and when > I get to that point. Up until the mds failure, they were simply cephfs > clients. I had three OSD servers updated to 10.2.7-r1 (one is also a > MON) and had two servers left to update. Both of these are also MONs > and were acting as a pair of dual active MDS servers running 10.2.2. > Monday morning I found out the hard way that an UPS one of them was on > has a dead battery. After I fsck'd and came back up, I saw the > following assertion error when it was trying to start it's mds.B server: > > > ==== mdsbeacon(64162/B up:replay seq 3 v4699) v7 ==== 126+0+0 (709014160 > 0 0) 0x7f6fb4001bc0 con 0x55f94779d > 8d0 > 0> 2017-10-09 11:43:06.935662 7f6fa9ffb700 -1 mds/journal.cc: In > function 'virtual void EImportStart::r > eplay(MDSRank*)' thread 7f6fa9ffb700 time 2017-10-09 11:43:06.934972 > mds/journal.cc: 2929: FAILED assert(mds->sessionmap.get_version() == cmapv) > > ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x82) [0x55f93d64a122] > 2: (EImportStart::replay(MDSRank*)+0x9ce) [0x55f93d52a5ce] > 3: (MDLog::_replay_thread()+0x4f4) [0x55f93d4a8e34] > 4: (MDLog::ReplayThread::entry()+0xd) [0x55f93d25bd4d] > 5: (()+0x74a4) [0x7f6fd009b4a4] > 6: (clone()+0x6d) [0x7f6fce5a598d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 rbd_mirror > 0/ 5 rbd_replay > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 1/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/10 civetweb > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > 0/ 0 refs > 1/ 5 xio > 1/ 5 compressor > 1/ 5 newstore > 1/ 5 bluestore > 1/ 5 bluefs > 1/ 3 bdev > 1/ 5 kstore > 4/ 5 rocksdb > 4/ 5 leveldb > 1/ 5 kinetic > 1/ 5 fuse > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-mds.B.log > > > > When I was googling around, I ran into this Cern presentation and tried > out the offline backware scrubbing commands on slide 25 first: > > https://indico.cern.ch/event/531810/contributions/2309925/attachments/1357386/2053998/GoncaloBorges-HEPIX16-v3.pdf > > > Both ran without any messages, so I'm assuming I have sane contents in > the cephfs_data and cephfs_metadata pools. Still no luck getting things > restarted, so I tried the cephfs-journal-tool journal reset on slide > 23. That didn't work either. Just for giggles, I tried setting up the > two Ryzen boxes as new mds.C and mds.D servers which would run on > 10.2.7-r1 instead of using mds.A and mds.B (10.2.2). The D server fails > with the same assert as follows: Because this system was running multiple active MDSs on Jewel (based on seeing an EImportStart journal entry), and that was known to be unstable, I would advise you to blow away the filesystem and create a fresh one using luminous (where multi-mds is stable), rather than trying to debug it. Going back to try and work out what went wrong with Jewel code is probably not a very valuable activity unless you have irreplacable data. If you do want to get this filesystem back on its feet in-place: (first stopping all MDSs) I'm guessing that your cephfs-journal-tool reset didn't help because you had multiple MDS ranks, and that tool just operates on rank 0 by default. You need to work out which rank's journal is actually damaged (it's part of the prefix to MDS log messages), and then pass a --rank argument to cephfs-journal-tool. You will also need to reset all the other ranks' journals to keep things consistent, and then do a "ceph fs reset" so that it will start up with a single MDS next time. If you get the filesystem up and running again, I'd still recommend copying anything important off it and creating a new one using luminous, rather than continuing to run with maybe-still-subtly-damaged metadata. John > > > === 132+0+1979520 (4198351460 0 1611007530) 0x7fffc4000a70 con > 0x7fffe0013310 > 0> 2017-10-09 13:01:31.571195 7fffd99f5700 -1 mds/journal.cc: In > function 'virtual void EImportStart::replay(MDSRank*)' thread > 7fffd99f5700 time 2017-10-09 13:01:31.570608 > mds/journal.cc: 2949: FAILED assert(mds->sessionmap.get_version() == cmapv) > > ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x80) [0x555555b7ebc8] > 2: (EImportStart::replay(MDSRank*)+0x9ea) [0x555555a5674a] > 3: (MDLog::_replay_thread()+0xe51) [0x5555559cef21] > 4: (MDLog::ReplayThread::entry()+0xd) [0x5555557778cd] > 5: (()+0x7364) [0x7ffff7bc5364] > 6: (clone()+0x6d) [0x7ffff6051ccd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com