Hi, I had a chance to catch John Spray at the Ceph Day, and he suggested that I try to reproduce this bug in luminos. To fix my immediate problem we discussed 2 ideas: 1. Manually edit the Meta-data, unfortunately I was not able to find any Information on how the meta-data is structured :-( 2. Edit the code to set the link count to 0 if it is negative: diff --git a/src/mds/StrayManager.cc b/src/mds/StrayManager.cc index 9e53907..2ca1449 100644 --- a/src/mds/StrayManager.cc +++ b/src/mds/StrayManager.cc @@ -553,6 +553,10 @@ bool StrayManager::__eval_stray(CDentry *dn, bool delay) logger->set(l_mdc_num_strays_delayed, num_strays_delayed); } + if (in->inode.nlink < 0) { + in->inode.nlink=0; + } + // purge? if (in->inode.nlink == 0) { // past snaprealm parents imply snapped dentry remote links. diff --git a/src/xxHash b/src/xxHash --- a/src/xxHash +++ b/src/xxHash @@ -1 +1 @@ Im not sure if this works, the patched mds no longer crashes, however I expected that this value: root@mds02:~ # ceph daemonperf mds.1 -----mds------ --mds_server-- ---objecter--- -----mds_cache----- ---mds_log---- rlat inos caps|hsr hcs hcr |writ read actv|recd recy stry purg|segs evts subm| 0 100k 0 | 0 0 0 | 0 0 0 | 0 0 625k 0 | 30 25k 0 ^^^^ Should go down, but it stays at 625k, unfortunately I don't have another System to compare. After I started the patched mds once, I reverted back to an unpatched mds, and it also stopped crashing, so I guess it did "fix" something. A question just out of curiosity, I tried to log these events with something like: dout(10) << "Fixed negative inode count"; or derr << "Fixed negative inode count"; But my compiler yelled at me for trying this. Micha Krause _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com