Re: Problematic inode preventing ceph-mds from starting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!


Yes, resetting journals is exactly what we did, quite a while ago, when the mds ran out of memory because a journal entry had an absurdly large number in it (I think it may have been an inode number). We probably also reset the inode table later, which I recently learned resets a data structure on disk, and probably started us overwriting inodes or dentries or both.


So I take it (we are learning about filesystems very quickly over here) that ceph is reusing inode numbers. Re-scanning dentries will somehow figure out which dentry is most recent, and remove the older (now wrong) one. And somehow it can handle hard links, possibly (we don't have many, or any, of these).


Thanks very much for your help. This has been fascinating.


Neale






From: Patrick Donnelly <pdonnell@xxxxxxxxxx>
Sent: Monday, October 28, 2019 12:52
To: Pickett, Neale T
Cc: ceph-users
Subject: Re: Problematic inode preventing ceph-mds from starting
 
On Fri, Oct 25, 2019 at 12:11 PM Pickett, Neale T <neale@xxxxxxxx> wrote:
> In the last week we have made a few changes to the down filesystem in an attempt to fix what we thought was an inode problem:
>
>
> cephfs-data-scan scan_extents   # about 1 day with 64 processes
>
> cephfs-data-scan scan_inodes   # about 1 day with 64 processes
>
> cephfs-data_scan scan_links   # about 1 day

Did you reset the journals or perform any other disaster recovery
commands? This process likely introduced the duplicate inodes.

> After these three, we tried to start an MDS and it stayed up. We then ran:
>
> ceph tell mds.a scrub start / recursive repair
>
>
> The repair ran about 3 days, spewing logs to `ceph -w` about duplicated inodes, until it stopped. All looked well until we began bringing production services back online, at which point many error messages appeared, the mds went back into damaged, and the fs back to degraded. At this point I removed the objects you suggested, which brought everything back briefly.
>
> The latest crash is:
>
>     -1> 2019-10-25 18:47:50.731 7fc1f3b56700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc: In function 'void MDCache::add_inode(CInode*)' thread 7fc1f3b56700 time 2019-1...
>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc: 258: FAILED ceph_assert(!p)

This error indicates a duplicate inode loaded into cache. Fixing this
probably requires significant intervention and (meta)data loss for
recent changes:

- Stop/unmount all clients. (Probably already the case if the rank is damaged!)

- Reset the MDS journal [1] and optionally recover any dentries first.
(This will hopefully resolve the ESubtreeMap errors you pasted.) Note
that some metadata may be lost through this command.

- `cephfs-data_scan scan_links` again. This should repair any
duplicate inodes (by dropping the older dentries).

- Then you can try marking the rank as repaired.

Good luck!

[1] https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/#journal-truncation


--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux