Re: Problematic inode preventing ceph-mds from starting

"Pickett, Neale T" <neale@xxxxxxxx> · Mon, 28 Oct 2019 22:57:49 +0000

Hi!

Yes, resetting journals is exactly what we did, quite a while ago, when the mds ran out of memory because a journal entry had an absurdly large number in it (I think it may have been an inode number). We probably also reset the inode table later, which I
 recently learned resets a data structure on disk, and probably started us overwriting inodes or dentries or both.

So I take it (we are learning about filesystems very quickly over here) that ceph is reusing inode numbers. Re-scanning dentries will somehow figure out which dentry is most recent, and remove the older (now wrong) one. And somehow it can handle hard links,
 possibly (we don't have many, or any, of these).

Thanks very much for your help. This has been fascinating.

Neale

From: Patrick Donnelly <pdonnell@xxxxxxxxxx>

Sent: Monday, October 28, 2019 12:52

To: Pickett, Neale T

Cc: ceph-users

Subject: Re:  Problematic inode preventing ceph-mds from starting

On Fri, Oct 25, 2019 at 12:11 PM Pickett, Neale T <neale@xxxxxxxx> wrote:

> In the last week we have made a few changes to the down filesystem in an attempt to fix what we thought was an inode problem:

>

>

> cephfs-data-scan scan_extents   # about 1 day with 64 processes

>

> cephfs-data-scan scan_inodes   # about 1 day with 64 processes

>

> cephfs-data_scan scan_links   # about 1 day

Did you reset the journals or perform any other disaster recovery

commands? This process likely introduced the duplicate inodes.

> After these three, we tried to start an MDS and it stayed up. We then ran:

>

> ceph tell mds.a scrub start / recursive repair

>

>

> The repair ran about 3 days, spewing logs to `ceph -w` about duplicated inodes, until it stopped. All looked well until we began bringing production services back online, at which point many error messages appeared, the mds went back into damaged, and the
 fs back to degraded. At this point I removed the objects you suggested, which brought everything back briefly.

>

> The latest crash is:

>

>     -1> 2019-10-25 18:47:50.731 7fc1f3b56700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc: In function
 'void MDCache::add_inode(CInode*)' thread 7fc1f3b56700 time 2019-1...

>

> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc: 258: FAILED ceph_assert(!p)

This error indicates a duplicate inode loaded into cache. Fixing this

probably requires significant intervention and (meta)data loss for

recent changes:

- Stop/unmount all clients. (Probably already the case if the rank is damaged!)

- Reset the MDS journal [1] and optionally recover any dentries first.

(This will hopefully resolve the ESubtreeMap errors you pasted.) Note

that some metadata may be lost through this command.

- `cephfs-data_scan scan_links` again. This should repair any

duplicate inodes (by dropping the older dentries).

- Then you can try marking the rank as repaired.

Good luck!

[1] 
https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/#journal-truncation

--

Patrick Donnelly, Ph.D.

He / Him / His

Senior Software Engineer

Red Hat Sunnyvale, CA

GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com