Help! Luminous 12.2.5 CephFS - MDS crashed and now won't start (failing at MDCache::add_inode)

Linh Vu <vul@xxxxxxxxxxxxxx> · Mon, 25 Jun 2018 09:06:45 +0000

Hi all,

We have a Luminous 12.2.5 cluster, running entirely just CephFS with 1 active and 1 standby MDS. The active MDS crashed and now won't start again with this same error:

#######

     0> 2018-06-25 16:11:21.136203 7f01c2749700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/mds/MDCache.cc: In function
 'void MDCache::add_inode(CInode*)' thread 7f01c2749700 time 2018-06-25 16:11:21.133236
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/mds/MDCache.cc: 262: FAILED assert(!p)
#######

Right before that point is just a bunch of client connection requests.

There are also a few other inode errors such as:

#######
2018-06-25 09:30:37.889166 7f934c1e5700 -1 log_channel(cluster) log [ERR] : loaded dup inode 0x1000098f00a [2,head] v3426852030 at ~mds0/stray5/1000098f00a, but inode 0x1000098f00a.head v3426838533 already exists at ~mds0/stray2/1000098f00a

#######

We've done this for recovery:

$ make sure all MDS are shut down (all crashed by this point anyway)
$ ceph fs set myfs cluster_down true

$ cephfs-journal-tool journal export backup.bin

$ cephfs-journal-tool event recover_dentries summary
Events by type:
  FRAGMENT: 9
  OPEN: 29082
  SESSION: 15
  SUBTREEMAP: 241
  UPDATE: 171835
Errors: 0
$ cephfs-table-tool all reset session

{
    "0": {
        "data": {},
        "result": 0
    }
}
$ cephfs-table-tool all reset inode

{
    "0": {
        "data": {},
        "result": 0
    }
}
$ cephfs-journal-tool --rank=myfs:0 journal reset

old journal was 35714605847583~423728061

new journal start will be 35715031236608 (1660964 bytes past old end)
writing journal head
writing EResetJournal entry
done
$ ceph mds fail 0

$ ceph fs reset hpc_projects --yes-i-really-mean-it

$ start up MDS again

However, we keep getting the same error as above.

We found this: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-December/023136.html which has
 a similar issue, and some suggestions on using the cephfs-table-tool take_inos command, as our problem looks like we can't create new inodes. However, we don't quite understand the show inode or take_inos command. On our cluster, we see this:

$ cephfs-table-tool 0 show inode
{
    "0": {
        "data": {
            "version": 1,
            "inotable": {
                "projected_free": [
                    {
                        "start": 1099511627776,
                        "len": 1099511627776
                    }
                ],
                "free": [
                    {
                        "start": 1099511627776,
                        "len": 1099511627776
                    }
                ]
            }
        },
        "result": 0
    }
}

Our test cluster shows the exact same output, and running `cephfs-table-tool all take_inos 100000` (on the test cluster) doesn't seem to do anything to the output of the above, and also the inode number from creating new files doesn't seem to jump
 +100K from where it was (likely we misunderstood how take_inos works). On our test cluster (no recovery nor reset has been run on it), the latest max inode, from our file creation and running `ls -li` is 1099511627792, just a tiny bit bigger than the
 "start" value above which seems to match the file count we've created on it. 

How do we find out what is our latest max inode on our production cluster, when `show inode` doesn't seem to show us anything useful? 

Also, FYI, over a week ago,
 we had a network failure, and had to perform recovery then. The recovery seemed OK, but there were some clients that were still running jobs from previously and seemed to have recovered so we were still in the process of draining and rebooting them as they
 finish their jobs. Some would come back with bad files but nothing that caused troubles until now. 

Very much appreciate any help!

Cheers,
Linh

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com