Re: "mds daemon damaged" after restarting MDS - Filesystem DOWN

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Wed, 12 Feb 2020 10:29:43 +0100

Hi all,
I'm helping Luca with this a bit and we made some progress.

We currently have an MDS starting and we're able to see the files.
But when browsing the filesystem we have lot of "loaded dup inode"
warnings, e.g.

2020-02-12 08:47:44.546063 mds.ceph-mon-01 [ERR] loaded dup inode
0x10000000000 [2,head] v16509035 at /pawsey-sync/nextcloud-data, but
inode 0x10000000000.head v16508956 already exists at /nextcloud-data

I believe the origin of this dup inode is:
   1. Last week before the outage several files in the root / where
moved to a subdir /pawsey-sync/  (e.g. mv /nextcloud-data
/pawsey-sync/nextcloud-data)
   2. (Then there was a memory issue on the mds node and after some
restarts the mds journal became corrupted.)
   3. Following the disaster recovery procedure [1],
cephfs-journal-tool event recover_dentries summary was executed. Luca
reports that it took 30 minutes to run -- I believe it re-applied
several mds ops redundantly, including moving those top level back to
the root /.
   4. Now there are dup inodes, since the old mds ops were re-applied.

The total amount of data here is not very much, so I ask the advice of
the ML. Should we:

  A. rsync the data somewhere else and recreate the cephfs from
scratch ? (I'm concerned the MDS might crash again while we read out
all the current state of the FS...)
  B. write a script to rmomapkeys the oldest of each dup inode, then
run some cephfs-data-scan consistency check as described here [2]

Any thoughts?

Thanks very much in advance!

Dan

[1] https://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#disaster-recovery-experts
[2] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/036279.html

On Mon, Feb 10, 2020 at 7:03 AM Luca Cervigni
<luca.cervigni@xxxxxxxxxxxxx> wrote:
>
> Today tried:
> stop all mds again.
> - Forced mds up with ceph mds repaired
> - starting mds leads to another crash, with mds not starting.
>
> ceph-post-file: 3775769f-dc90-4ce8-a0b8-6f2bfea14d6c
>
> then I tried to follow the disaster recovery page:
> https://docs.ceph.com/docs/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts
>
> I run:
> cephfs-journal-tool --rank=pawsey-sync-fs:0 event recover_dentries summary
> cephfs-journal-tool --rank=pawsey-sync-fs:0 journal reset
> cephfs-table-tool all reset session
>
> Now the mds starts and I can see files with serveral errors in the logs.
> This is private though, I posted it there:
>
> ceph-post-file: 6f9f6a57-52ea-4657-878e-0e60e5a069c2
>
> 2020-02-10 05:36:36.457 7f8afab48700  0 mds.0.cache.dir(0x609) _fetched
> badness: got (but i already had) [inode 0x1000008370d [2,head] "/correct
> file path" auth v4466 s=2997 n(v0 rc2020-01-16 01:39:10.112907 b2997
> 1=1+0) (ifi
>
> So the files seems to be there somewhere, but when I mount the
> directory, I cannot see them:
> There should be a directory in the / but I can see only the files. The
> directory seems to be a number:
>
> root@xxxxxxx:/mnt/cephfs# ls -la
> total 5
> drwxr-xr-x 5 root root    3 Feb  4 14:45 .
> drwxr-xr-x 4 root root 4096 Jan 15 15:41 ..
> -rw------- 1 root root    0 Feb  4 14:45 .asd.swp
> -rw------- 1 root root    0 Feb  4 14:45 .asd.swpx
> -rw------- 1 root root    0 Feb  4 14:44 .pippo.swp
> -rw------- 1 root root    0 Feb  4 14:44 .pippo.swpx
> -rw-r--r-- 1 root root    0 Feb  4 14:44 4913 <----- maybe this is the
> missing DIR
> -rw-r--r-- 1 root root    7 Feb  4 14:45 asd
> -rw-r--r-- 1 root root    0 Feb  4 14:44 pippo
> -rw-r--r-- 1 root root    0 Feb  4 14:44 pippo~
>
> How do I recover my files?
>
>
>
>
>
>
> On 7/2/20 5:03 pm, Luca Cervigni wrote:
> > Related Bug with DEV logs and crash logs
> >
> > https://tracker.ceph.com/issues/44030
> >
> > On 7/2/20 4:06 pm, Luca Cervigni wrote:
> >> Not sure if the previous message went through since I was not
> >> subscribed. If yes sorry for the spam.
> >>
> >> Dear all
> >>
> >> Running nautilus 14.2.7. The data in the FS are important and cannot
> >> be lost.
> >>
> >> Today I increased the PGS of the volume pool from 8k to 16k. The
> >> active mds started reporting slow ops. (the filesystem is not in the
> >> volume pool). After few hours the FS was very slow, I reduced the
> >> backfill to 1 and since the situation was not improving, I restarted
> >> the MDS (no other standby MDSs. it was a single mds).
> >>
> >> After that the crash. The mds does not goes back up with this error:
> >>
> >> 020-02-07 07:03:32.477 7fbf69647700 -1 NetHandler create_socket
> >> couldn't create socket (97) Address family not supported by protocol
> >> 2020-02-07 07:03:32.541 7fbf65e6a700  1 mds.ceph-mon-01 Updating MDS
> >> map to version 48461 from mon.2
> >> 2020-02-07 07:03:37.613 7fbf65e6a700  1 mds.ceph-mon-01 Updating MDS
> >> map to version 48462 from mon.2
> >> 2020-02-07 07:03:37.613 7fbf65e6a700  1 mds.ceph-mon-01 Map has
> >> assigned me to become a standby
> >> 2020-02-07 07:14:11.789 7fbf66e42700 -1 received  signal: Terminated
> >> from /sbin/init  (PID: 1) UID: 0
> >> 2020-02-07 07:14:11.789 7fbf66e42700 -1 mds.ceph-mon-01 *** got
> >> signal Terminated ***
> >> 2020-02-07 07:14:11.789 7fbf66e42700  1 mds.ceph-mon-01 suicide!
> >> Wanted state up:standby
> >> 2020-02-07 07:14:12.565 7fbf65e6a700  0 ms_deliver_dispatch:
> >> unhandled message 0x563fcb438d00 mdsmap(e 48465) v1 from mon.2
> >> v1:10.3.78.32:6789/0
> >> 2020-02-07 07:25:16.782 7f26c39de2c0  0 set uid:gid to 64045:64045
> >> (ceph:ceph)
> >> 2020-02-07 07:25:16.782 7f26c39de2c0  0 ceph version 14.2.7
> >> (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable), process
> >> ceph-mds, pid 3724
> >> 2020-02-07 07:25:16.782 7f26c39de2c0  0 pidfile_write: ignore empty
> >> --pid-file
> >> 2020-02-07 07:25:16.786 7f26b5326700 -1 NetHandler create_socket
> >> couldn't create socket (97) Address family not supported by protocol
> >> 2020-02-07 07:25:16.790 7f26b1b49700  1 mds.ceph-mon-01 Updating MDS
> >> map to version 48472 from mon.0
> >> 2020-02-07 07:25:17.691 7f26b1b49700  1 mds.ceph-mon-01 Updating MDS
> >> map to version 48473 from mon.0
> >> 2020-02-07 07:25:17.691 7f26b1b49700  1 mds.ceph-mon-01 Map has
> >> assigned me to become a standby
> >> 2020-02-07 07:29:50.306 7f26b2b21700 -1 received  signal: Terminated
> >> from /sbin/init  (PID: 1) UID: 0
> >> 2020-02-07 07:29:50.306 7f26b2b21700 -1 mds.ceph-mon-01 *** got
> >> signal Terminated ***
> >> 2020-02-07 07:29:50.306 7f26b2b21700  1 mds.ceph-mon-01 suicide!
> >> Wanted state up:standby
> >> 2020-02-07 07:29:50.526 7f26b5b27700  1 mds.beacon.ceph-mon-01
> >> discarding unexpected beacon reply down:dne seq 70 dne
> >> 2020-02-07 07:29:52.802 7f26b1b49700  0 ms_deliver_dispatch:
> >> unhandled message 0x55ef110ab200 mdsmap(e 48474) v1 from mon.0
> >> v1:10.3.78.22:6789/0
> >>
> >> Rebooting did not help
> >>
> >> I asked #CEPH OFTC and they suggested to bring up another "fresh"
> >> mds. I did that, and they do not start, going to standby. LOGS:
> >>
> >> 2020-02-07 07:12:46.696 7fe4b388b2c0  0 set uid:gid to 64045:64045
> >> (ceph:ceph)
> >> 2020-02-07 07:12:46.696 7fe4b388b2c0  0 ceph version 14.2.7
> >> (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable), process
> >> ceph-mds, pid 74742
> >> 2020-02-07 07:12:46.696 7fe4b388b2c0  0 pidfile_write: ignore empty
> >> --pid-file
> >> 2020-02-07 07:12:46.704 7fe4a19f6700  1 mds.ceph-mon-02 Updating MDS
> >> map to version 48462 from mon.0
> >> 2020-02-07 07:12:47.456 7fe4a19f6700  1 mds.ceph-mon-02 Updating MDS
> >> map to version 48463 from mon.0
> >> 2020-02-07 07:12:47.456 7fe4a19f6700  1 mds.ceph-mon-02 Map has
> >> assigned me to become a standby
> >> 2020-02-07 07:14:16.615 7fe4a29ce700 -1 received  signal: Terminated
> >> from /sbin/init  (PID: 1) UID: 0
> >> 2020-02-07 07:14:16.615 7fe4a29ce700 -1 mds.ceph-mon-02 *** got
> >> signal Terminated ***
> >> 2020-02-07 07:14:16.615 7fe4a29ce700  1 mds.ceph-mon-02 suicide!
> >> Wanted state up:standby
> >> 2020-02-07 07:14:16.947 7fe4a51d3700  1 mds.beacon.ceph-mon-02
> >> discarding unexpected beacon reply down:dne seq 24 dne
> >> 2020-02-07 07:14:18.715 7fe4a19f6700  0 ms_deliver_dispatch:
> >> unhandled message 0x5602fbc6df80 mdsmap(e 48466) v1 from mon.0
> >> v2:10.3.78.22:3300/0
> >> 2020-02-07 07:25:02.093 7f3c2f92a2c0  0 set uid:gid to 64045:64045
> >> (ceph:ceph)
> >> 2020-02-07 07:25:02.093 7f3c2f92a2c0  0 ceph version 14.2.7
> >> (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable), process
> >> ceph-mds, pid 75471
> >> 2020-02-07 07:25:02.093 7f3c2f92a2c0  0 pidfile_write: ignore empty
> >> --pid-file
> >> 2020-02-07 07:25:02.097 7f3c1da95700  1 mds.ceph-mon-02 Updating MDS
> >> map to version 48471 from mon.2
> >> 2020-02-07 07:25:06.413 7f3c1da95700  1 mds.ceph-mon-02 Updating MDS
> >> map to version 48472 from mon.2
> >> 2020-02-07 07:25:06.413 7f3c1da95700  1 mds.ceph-mon-02 Map has
> >> assigned me to become a standby
> >> 2020-02-07 07:29:56.869 7f3c1ea6d700 -1 received  signal: Terminated
> >> from /sbin/init  (PID: 1) UID: 0
> >> 2020-02-07 07:29:56.869 7f3c1ea6d700 -1 mds.ceph-mon-02 *** got
> >> signal Terminated ***
> >> 2020-02-07 07:29:56.869 7f3c1ea6d700  1 mds.ceph-mon-02 suicide!
> >> Wanted state up:standby
> >> 2020-02-07 07:29:58.113 7f3c1da95700  0 ms_deliver_dispatch:
> >> unhandled message 0x563c5df33f80 mdsmap(e 48475) v1 from mon.2
> >> v2:10.3.78.32:3300/0
> >>
> >> Here ceph status
> >>
> >>   cluster:
> >>     id:     a8dde71d-ca7b-4cf5-bd38-8989c6a27011
> >>     health: HEALTH_ERR
> >>             1 filesystem is degraded
> >>             1 filesystem is offline
> >>             1 mds daemon damaged
> >>             2 daemons have recently crashed
> >>
> >>   services:
> >>     mon: 3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03 (age 41m)
> >>     mgr: ceph-mon-02(active, since 41m), standbys: ceph-mon-03,
> >> ceph-mon-01
> >>     mds: pawsey-sync-fs:0/1, 1 damaged
> >>     osd: 925 osds: 715 up (since 2h), 715 in (since 23h)
> >>     rgw: 3 daemons active (radosgw-01, radosgw-02, radosgw-03)
> >>
> >>   data:
> >>     pools:   24 pools, 26569 pgs
> >>     objects: 52.64M objects, 199 TiB
> >>     usage:   685 TiB used, 6.7 PiB / 7.3 PiB avail
> >>     pgs:     26513 active+clean
> >>              54    active+clean+scrubbing+deep
> >>              2     active+clean+scrubbing
> >>
> >> Ceph osd ls detail: https://pastebin.com/raw/bxi4HSa5
> >>
> >> the metadata pool is on NVMe
> >>
> >> Can anyone give me some help?
> >>
> >> Any command run like journal repairs do not work as they expect the
> >> MDs to be up.
> >>
> >> Thanks
> >>
> >> Cheers
> >>
> >>
> --
> Luca Cervigni
> Infrastructure Architect
>
> Tel. +61864368802
> Pawsey Supercomputing Centre
> 1 Bryce Ave, Kensington WA 6151
> Australia
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx