Hi, Today while debugging something we had a few questions that might lead to improving the cephfs forward scrub docs: https://docs.ceph.com/en/latest/cephfs/scrub/ tldr: 1. Should we document which sorts of issues that the forward scrub is able to fix? 2. Can we make it more visible (in docs) that scrubbing is not supported with multi-mds? 3. Isn't the new `ceph -s` scrub task status misleading with multi-mds? Details here: 1) We found a CephFS directory with a number of zero sized files: # ls -l ... -rw-r--r-- 1 1001890000 1001890000 0 Nov 3 11:58 upload_fc501199e3e7abe6b574101cf34aeefb.png -rw-r--r-- 1 1001890000 1001890000 0 Nov 3 12:23 upload_fce4f55348185fefa0abdd8d11095ba8.gif -rw-r--r-- 1 1001890000 1001890000 0 Nov 3 11:54 upload_fd95b8358851f0dac22fb775046a6163.png ... The user claims that those files were non-zero sized last week. The sequence of zero sized files includes *all* files written between Nov 2 and 9. The user claims that his client was running out of memory, but this is now fixed. So I suspect that his ceph client (kernel 3.10.0-1127.19.1.el7.x86_64) was not behaving well. Anyway, I noticed that even though the dentries list 0 bytes, the underlying rados objects have data, and the data looks good. E.g. # rados get -p cephfs_data 200212e68b5.00000000 --namespace=xxx 200212e68b5.00000000 # file 200212e68b5.00000000 200212e68b5.00000000: PNG image data, 960 x 815, 8-bit/color RGBA, non-interlaced So I managed to recover the files doing something like this (using an input file mapping inode to filename) [see PS 0]. But I'm wondering if a forward scrub is able to fix this sort of problem directly? Should we document which sorts of issues that the forward scrub is able to fix? I anyway tried to scrub it, which led to: # ceph tell mds.cephflax-mds-xxx scrub start /volumes/_nogroup/xxx recursive repair Scrub is not currently supported for multiple active MDS. Please reduce max_mds to 1 and then scrub. So ... 2) Shouldn't we update the doc to mention loud and clear that scrub is not currently supported for multiple active MDS? 3) I was somehow surprised by this, because I had thought that the new `ceph -s` multi-mds scrub status implied that multi-mds scrubbing was now working: task status: scrub status: mds.x: idle mds.y: idle mds.z: idle Is it worth reporting this task status for cephfs if we can't even scrub them? Thanks!! Dan [0] mkdir -p recovered while read -r a b; do for i in {0..9} do echo "rados stat --cluster=flax --pool=cephfs_data --namespace=xxx" $(printf "%x" $a).0000000$i "&&" "rados get --cluster=flax --pool=cephfs_data --namespace=xxx" $(printf "%x" $a).0000000$i $(printf "%x" $a).0000000$i done echo cat $(printf "%x" $a).* ">" $(printf "%x" $a) echo mv $(printf "%x" $a) recovered/$b done < inones_fnames.txt _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx