Hi!
Yes, resetting journals is exactly what we did, quite a while ago, when the mds ran out of memory because a journal entry had an absurdly large number in it (I think it may have been an inode number). We probably also reset the inode table later, which I recently learned resets a data structure on disk, and probably started us overwriting inodes or dentries or both.
So I take it (we are learning about filesystems very quickly over here) that ceph is reusing inode numbers. Re-scanning dentries will somehow figure out which dentry is most recent, and remove the older (now wrong) one. And somehow it can handle hard links, possibly (we don't have many, or any, of these).
Thanks very much for your help. This has been fascinating.
Neale
From: Patrick Donnelly <pdonnell@xxxxxxxxxx>
Sent: Monday, October 28, 2019 12:52 To: Pickett, Neale T Cc: ceph-users Subject: Re: Problematic inode preventing ceph-mds from starting On Fri, Oct 25, 2019 at 12:11 PM Pickett, Neale T <neale@xxxxxxxx> wrote:
> In the last week we have made a few changes to the down filesystem in an attempt to fix what we thought was an inode problem: > > > cephfs-data-scan scan_extents # about 1 day with 64 processes > > cephfs-data-scan scan_inodes # about 1 day with 64 processes > > cephfs-data_scan scan_links # about 1 day Did you reset the journals or perform any other disaster recovery commands? This process likely introduced the duplicate inodes. > After these three, we tried to start an MDS and it stayed up. We then ran: > > ceph tell mds.a scrub start / recursive repair > > > The repair ran about 3 days, spewing logs to `ceph -w` about duplicated inodes, until it stopped. All looked well until we began bringing production services back online, at which point many error messages appeared, the mds went back into damaged, and the fs back to degraded. At this point I removed the objects you suggested, which brought everything back briefly. > > The latest crash is: > > -1> 2019-10-25 18:47:50.731 7fc1f3b56700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc: In function 'void MDCache::add_inode(CInode*)' thread 7fc1f3b56700 time 2019-1... > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc: 258: FAILED ceph_assert(!p) This error indicates a duplicate inode loaded into cache. Fixing this probably requires significant intervention and (meta)data loss for recent changes: - Stop/unmount all clients. (Probably already the case if the rank is damaged!) - Reset the MDS journal [1] and optionally recover any dentries first. (This will hopefully resolve the ESubtreeMap errors you pasted.) Note that some metadata may be lost through this command. - `cephfs-data_scan scan_links` again. This should repair any duplicate inodes (by dropping the older dentries). - Then you can try marking the rank as repaired. Good luck! [1] https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/#journal-truncation -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com