Update to Mimic with prior Snapshots leads to MDS damaged metadata

Tobias Florek <ceph@xxxxxxxxxx> · Wed, 6 Jun 2018 09:25:25 +0200

Hi,

I upgraded a ceph cluster to mimic yesterday according to the release
notes. Specifically I did stop all standby MDS and then restarted the
only active MDS with the new version.

The cluster was installed with luminous. Its cephfs volume had snapshots
prior to the update, but only one active MDS.

The post-installation steps failed though:
 ceph daemon mds.<id> scrub_path /
returned an error, which I corrected with
 ceph daemon mds.<id> scrub_path / repair

While
 ceph daemon mds.<id> scrub_path '~mdsdir'
did not show any error.

After some time, ceph health reported MDS damaged metadata:
> ceph tell mds.<id> damage ls | jq '.[].damage_type' | sort | uniq -c
    398 "backtrace"
    718 "dentry"

Examples of damage:

{
  "damage_type": "dentry",
  "id": 118195760,
  "ino": 1099513350198,
  "frag": "000100*",
  "dname":
"1524578400.M820820P705532.dovecot-15-hgjlx,S=425674,W=431250:2,RS",
  "snap_id": "head",
  "path":
"/path/to/mails/user/Maildir/.Trash/cur/1524578400.M820820P705532.dovecot-15-hgjlx,S=425674,W=431250:2,RS"
},
{
  "damage_type": "backtrace",
  "id": 121083841,
  "ino": 1099515215027,
  "path":
"/path/to/mails/other_user/Maildir/.Junk/cur/1528189963.M416032P698926.dovecot-15-xmpkh,S=4010,W=4100:2,Sab"
},

Directories with damage can still be listed by the kernel cephfs mount
(4.16.7), but not the fuse mount, which stalls.

Can anyone help? That's unfortunately a production cluster.

Regards,
 Tobias Florek
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com