Re: MDS damaged after mimic 13.2.1 to 13.2.2 upgrade

Alfredo Daniel Rezinovsky <alfrenovsky@xxxxxxxxx> · Mon, 8 Oct 2018 10:46:18 -0300

On 08/10/18 10:20, Yan, Zheng wrote:
On Mon, Oct 8, 2018 at 9:07 PM Alfredo Daniel Rezinovsky
<alfrenovsky@xxxxxxxxx> wrote:

On 08/10/18 09:45, Yan, Zheng wrote:
On Mon, Oct 8, 2018 at 6:40 PM Alfredo Daniel Rezinovsky
<alfrenovsky@xxxxxxxxx> wrote:
On 08/10/18 07:06, Yan, Zheng wrote:
On Mon, Oct 8, 2018 at 5:43 PM Sergey Malinin <hell@xxxxxxxxxxx> wrote:
On 8.10.2018, at 12:37, Yan, Zheng <ukernel@xxxxxxxxx> wrote:

On Mon, Oct 8, 2018 at 4:37 PM Sergey Malinin <hell@xxxxxxxxxxx> wrote:
What additional steps need to be taken in order to (try to) regain access to the fs providing that I backed up metadata pool, created alternate metadata pool and ran scan_extents, scan_links, scan_inodes, and somewhat recursive scrub.
After that I only mounted the fs read-only to backup the data.
Would anything even work if I had mds journal and purge queue truncated?

did you backed up whole metadata pool?  did you make any modification
to the original metadata pool? If you did, what modifications?
I backed up both journal and purge queue and used cephfs-journal-tool to recover dentries, then reset journal and purge queue on original metadata pool.
You can try restoring original journal and purge queue, then downgrade
mds to 13.2.1.   Journal objects names are 20x.xxxxxxxx, purge queue
objects names are 50x.xxxxxxxxx.
I'm already done a scan_extents and doing a scan_inodes, Do i need to
finish with the scan_links?

I'm with 13.2.2. DO I finish the scan_links and then downgrade?

I have a backup done with "cephfs-journal-tool journal export
backup.bin". I think I don't have the pugue queue

can I reset the purgue-queue journal?, Can I import an empty file

It's better to restore journal to original metadata pool and reset
purge queue to empty, then try starting mds. Reset the purge queue
will leave some objects in orphan states. But we can handle them
later.

Regards
Yan, Zheng
Let's see...

"cephfs-journal-tool journal import  backup.bin" will restore the whole
metadata ?
That's what "journal" means?

It just restores the journal. If you only reset original fs' journal
and purge queue (run scan_foo commands with alternate metadata pool).
It's highly likely restoring the journal will bring your fs back.

So I can stopt  cephfs-data-scan, run the import, downgrade, and then
reset the purge queue?

you said you have already run scan_extents and scan_inodes. what
cephfs-data-scan command is running?
Already ran (without alternate metadata)

time cephfs-data-scan scan_extents cephfs_data # 10 hours

time cephfs-data-scan scan_inodes cephfs_data # running 3 hours
with a warning:
7fddd8f64ec0 -1 datascan.inject_with_backtrace: Dentry 
0x0x10000db852b/dovecot.index already exists but points to 0x0x1000134f97f

Still not run:

time cephfs-data-scan scan_links

After 'import original journal'.  run 'ceph mds repaired
fs_name:damaged_rank', then try restarting mds. Check if mds can
start.

Please remember me the commands:
I've been 3 days without sleep, and I don't wanna to broke it more.

sorry for that.
I updated on friday, broke a golden rule: "READ ONLY FRIDAY". My fault.
Thanks

What do I do with the journals?

Before proceeding to alternate metadata pool recovery I was able to start MDS but it soon failed throwing lots of 'loaded dup inode' errors, not sure if that involved changing anything in the pool.
I have left the original metadata pool untouched sine then.

Yan, Zheng

On 8.10.2018, at 05:15, Yan, Zheng <ukernel@xxxxxxxxx> wrote:

Sorry. this is caused wrong backport. downgrading mds to 13.2.1 and
marking mds repaird can resolve this.

Yan, Zheng
On Sat, Oct 6, 2018 at 8:26 AM Sergey Malinin <hell@xxxxxxxxxxx> wrote:
Update:
I discovered http://tracker.ceph.com/issues/24236 and https://github.com/ceph/ceph/pull/22146
Make sure that it is not relevant in your case before proceeding to operations that modify on-disk data.

On 6.10.2018, at 03:17, Sergey Malinin <hell@xxxxxxxxxxx> wrote:

I ended up rescanning the entire fs using alternate metadata pool approach as in http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
The process has not competed yet because during the recovery our cluster encountered another problem with OSDs that I got fixed yesterday (thanks to Igor Fedotov @ SUSE).
The first stage (scan_extents) completed in 84 hours (120M objects in data pool on 8 hdd OSDs on 4 hosts). The second (scan_inodes) was interrupted by OSDs failure so I have no timing stats but it seems to be runing 2-3 times faster than extents scan.
As to root cause -- in my case I recall that during upgrade I had forgotten to restart 3 OSDs, one of which was holding metadata pool contents, before restarting MDS daemons and that seemed to had an impact on MDS journal corruption, because when I restarted those OSDs, MDS was able to start up but soon failed throwing lots of 'loaded dup inode' errors.

On 6.10.2018, at 00:41, Alfredo Daniel Rezinovsky <alfrenovsky@xxxxxxxxx> wrote:

Same problem...

# cephfs-journal-tool --journal=purge_queue journal inspect
2018-10-05 18:37:10.704 7f01f60a9bc0 -1 Missing object 500.0000016c
Overall journal integrity: DAMAGED
Objects missing:
0x16c
Corrupt regions:
0x5b000000-ffffffffffffffff

Just after upgrade to 13.2.2

Did you fixed it?

On 26/09/18 13:05, Sergey Malinin wrote:

Hello,
Followed standard upgrade procedure to upgrade from 13.2.1 to 13.2.2.
After upgrade MDS cluster is down, mds rank 0 and purge_queue journal are damaged. Resetting purge_queue does not seem to work well as journal still appears to be damaged.
Can anybody help?

mds log:

-789> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.mds2 Updating MDS map to version 586 from mon.2
-788> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.0.583 handle_mds_map i am now mds.0.583
-787> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.0.583 handle_mds_map state change up:rejoin --> up:active
-786> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.0.583 recovery_done -- successful recovery!
<skip>
    -38> 2018-09-26 18:42:32.707 7f70f28a7700 -1 mds.0.purge_queue _consume: Decode error at read_pos=0x322ec6636
    -37> 2018-09-26 18:42:32.707 7f70f28a7700  5 mds.beacon.mds2 set_want_state: up:active -> down:damaged
    -36> 2018-09-26 18:42:32.707 7f70f28a7700  5 mds.beacon.mds2 _send down:damaged seq 137
    -35> 2018-09-26 18:42:32.707 7f70f28a7700 10 monclient: _send_mon_message to mon.ceph3 at mon:6789/0
    -34> 2018-09-26 18:42:32.707 7f70f28a7700  1 -- mds:6800/e4cc09cf --> mon:6789/0 -- mdsbeacon(14c72/mds2 down:damaged seq 137 v24a) v7 -- 0x563b321ad480 con 0
<skip>
     -3> 2018-09-26 18:42:32.743 7f70f98b5700  5 -- mds:6800/3838577103 >> mon:6789/0 conn(0x563b3213e000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8 cs=1 l=1). rx mon.2 seq 29 0x563b321ab880 mdsbeaco
n(85106/mds2 down:damaged seq 311 v587) v7
     -2> 2018-09-26 18:42:32.743 7f70f98b5700  1 -- mds:6800/3838577103 <== mon.2 mon:6789/0 29 ==== mdsbeacon(85106/mds2 down:damaged seq 311 v587) v7 ==== 129+0+0 (3296573291 0 0) 0x563b321ab880 con 0x563b3213e
000
     -1> 2018-09-26 18:42:32.743 7f70f98b5700  5 mds.beacon.mds2 handle_mds_beacon down:damaged seq 311 rtt 0.038261
      0> 2018-09-26 18:42:32.743 7f70f28a7700  1 mds.mds2 respawn!

# cephfs-journal-tool --journal=purge_queue journal inspect
Overall journal integrity: DAMAGED
Corrupt regions:
0x322ec65d9-ffffffffffffffff

# cephfs-journal-tool --journal=purge_queue journal reset
old journal was 13470819801~8463
new journal start will be 13472104448 (1276184 bytes past old end)
writing journal head
done

# cephfs-journal-tool --journal=purge_queue journal inspect
2018-09-26 19:00:52.848 7f3f9fa50bc0 -1 Missing object 500.00000c8c
Overall journal integrity: DAMAGED
Objects missing:
0xc8c
Corrupt regions:
0x323000000-ffffffffffffffff
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com