Re: Help! Luminous 12.2.5 CephFS - MDS crashed and now won't start (failing at MDCache::add_inode)

Linh Vu <vul@xxxxxxxxxxxxxx> · Mon, 25 Jun 2018 10:02:42 +0000

So my colleague Sean Crosby and I were looking through the logs (with debug mds = 10) and found some references just before the crash to inode number. We converted it from HEX to decimal and got something like 109953*5*627776
 (last few digits not necessarily correct). We set one digit up i.e to 109953*6*627776 and used that as the value for take_inos i.e:

$ cephfs-table-tool all take_inos 1099536627776

After that, the MDS could start successfully and we have a HEALTH_OK cluster once more!

It would still be useful if `show inode` in cephfs-table-tool actually shows us the max inode number at least though. And I think take_inos should be documented as well in the Disaster Recovery guide.
 :) 

We'll be monitoring the cluster for the next few days. Hopefully nothing too interesting to share after this!
😉 

Cheers,
Linh

From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Linh Vu <vul@xxxxxxxxxxxxxx>

Sent: Monday, 25 June 2018 7:06:45 PM

To: ceph-users

Subject: [ceph-users] Help! Luminous 12.2.5 CephFS - MDS crashed and now won't start (failing at MDCache::add_inode)

Hi all,

We have a Luminous 12.2.5 cluster, running entirely just CephFS with 1 active and 1 standby MDS. The active MDS crashed and now won't start again with this same error:

#######

     0> 2018-06-25 16:11:21.136203 7f01c2749700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/mds/MDCache.cc: In function
 'void MDCache::add_inode(CInode*)' thread 7f01c2749700 time 2018-06-25 16:11:21.133236
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/mds/MDCache.cc: 262: FAILED assert(!p)
#######

Right before that point is just a bunch of client connection requests.

There are also a few other inode errors such as:

#######
2018-06-25 09:30:37.889166 7f934c1e5700 -1 log_channel(cluster) log [ERR] : loaded dup inode 0x1000098f00a [2,head] v3426852030 at ~mds0/stray5/1000098f00a, but inode 0x1000098f00a.head v3426838533 already exists at ~mds0/stray2/1000098f00a

#######

We've done this for recovery:

$ make sure all MDS are shut down (all crashed by this point anyway)
$ ceph fs set myfs cluster_down true

$ cephfs-journal-tool journal export backup.bin

$ cephfs-journal-tool event recover_dentries summary
Events by type:
  FRAGMENT: 9
  OPEN: 29082
  SESSION: 15
  SUBTREEMAP: 241
  UPDATE: 171835
Errors: 0
$ cephfs-table-tool all reset session

{
    "0": {
        "data": {},
        "result": 0
    }
}
$ cephfs-table-tool all reset inode

{
    "0": {
        "data": {},
        "result": 0
    }
}
$ cephfs-journal-tool --rank=myfs:0 journal reset

old journal was 35714605847583~423728061

new journal start will be 35715031236608 (1660964 bytes past old end)
writing journal head
writing EResetJournal entry
done
$ ceph mds fail 0

$ ceph fs reset hpc_projects --yes-i-really-mean-it

$ start up MDS again

However, we keep getting the same error as above.

We found this: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-December/023136.html which has a similar issue, and
 some suggestions on using the cephfs-table-tool take_inos command, as our problem looks like we can't create new inodes. However, we don't quite understand the show inode or take_inos command. On our cluster, we see this:

$ cephfs-table-tool 0 show inode
{
    "0": {
        "data": {
            "version": 1,
            "inotable": {
                "projected_free": [
                    {
                        "start": 1099511627776,
                        "len": 1099511627776
                    }
                ],
                "free": [
                    {
                        "start": 1099511627776,
                        "len": 1099511627776
                    }
                ]
            }
        },
        "result": 0
    }
}

Our test cluster shows the exact same output, and running `cephfs-table-tool all take_inos 100000` (on the test cluster) doesn't seem to do anything to the output of the above, and also the inode number from creating new files doesn't seem to jump
 +100K from where it was (likely we misunderstood how take_inos works). On our test cluster (no recovery nor reset has been run on it), the latest max inode, from our file creation and running `ls -li` is 1099511627792, just a tiny bit bigger than the
 "start" value above which seems to match the file count we've created on it. 

How do we find out what is our latest max inode on our production cluster, when `show inode` doesn't seem to show us anything useful? 

Also, FYI, over a week ago, we had a
 network failure, and had to perform recovery then. The recovery seemed OK, but there were some clients that were still running jobs from previously and seemed to have recovered so we were still in the process of draining and rebooting them as they finish their
 jobs. Some would come back with bad files but nothing that caused troubles until now. 

Very much appreciate any help!

Cheers,
Linh

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com