We accidentally found ourselves upgraded from 12.2.8 to 13.2.2 after a ceph-deploy install went awry (we were expecting it to upgrade to 12.2.9 and not jump a major release without warning) Anyway .. as a result, we ended up with an mds journal error and 1 daemon reporting as damaged Having got nowhere trying to ask for help on irc, we've followed various forum posts and disaster recovery guides, we ended up resetting the journal which left the daemon as no longer “damaged” however we’re now seeing mds segfault whilst trying to replay /build/ceph-13.2.2/src/mds/journal.cc: 1572: FAILED assert(g_conf->mds_wipe_sessions) ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fad637f70f2] 2: (()+0x3162b7) [0x7fad637f72b7] 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x5f4b) [0x7a7a6b] 4: (EUpdate::replay(MDSRank*)+0x39) [0x7a8fa9] 5: (MDLog::_replay_thread()+0x864) [0x752164] 6: (MDLog::ReplayThread::entry()+0xd) [0x4f021d] 7: (()+0x76ba) [0x7fad6305a6ba] 8: (clone()+0x6d) [0x7fad6288341d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. full logs We’ve been unable to access the cephfs file system since all of this started …. attempts to mount fail with reports that “mds probably not available” Oct 28 23:47:02 mirrors kernel: [115602.911193] ceph: probably no mds server is up root@mds02:~# ceph -s cluster: id: 78d5bf7d-b074-47ab-8d73-bd4d99df98a5 health: HEALTH_WARN 1 filesystem is degraded insufficient standby MDS daemons available too many PGs per OSD (276 > max 250) services: mon: 3 daemons, quorum mon01,mon02,mon03 mgr: mon01(active), standbys: mon02, mon03 mds: fido_fs-2/2/1 up {0=mds01=up:resolve,1=mds02=up:replay(laggy or crashed)} osd: 27 osds: 27 up, 27 in data: pools: 15 pools, 3168 pgs objects: 16.97 M objects, 30 TiB usage: 71 TiB used, 27 TiB / 98 TiB avail pgs: 3168 active+clean io: client: 680 B/s rd, 1.1 MiB/s wr, 0 op/s rd, 345 op/s wr Before I just trash the entire fs and give up on ceph, does anyone have any suggestions as to how we can fix this? root@mds02:~# ceph versions { "mon": { "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)": 3 }, "mgr": { "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)": 3 }, "osd": { "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)": 27 }, "mds": { "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)": 2 }, "overall": { "ceph version 12.2.8 (ae699615bac534ea496ee965ac6192cb7e0e07c0) luminous (stable)": 27, "ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)": 8 } } |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com