On Thu, Jul 1, 2021 at 12:53 AM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote: > > Hi Dan, > > Sorry for the very late reply -- I'm going through old unanswered email. > > On Mon, Nov 9, 2020 at 4:13 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > > > Hi, > > > > Today while debugging something we had a few questions that might lead > > to improving the cephfs forward scrub docs: > > https://docs.ceph.com/en/latest/cephfs/scrub/ > > > > tldr: > > 1. Should we document which sorts of issues that the forward scrub is > > able to fix? > > Yes, I've made a ticket: https://tracker.ceph.com/issues/51459 Great, thanks! > > > 2. Can we make it more visible (in docs) that scrubbing is not > > supported with multi-mds? > > This is no longer the case since Pacific, as you probably know. > > > 3. Isn't the new `ceph -s` scrub task status misleading with multi-mds? > > > > Details here: > > > > 1) We found a CephFS directory with a number of zero sized files: > > > > # ls -l > > ... > > -rw-r--r-- 1 1001890000 1001890000 0 Nov 3 11:58 > > upload_fc501199e3e7abe6b574101cf34aeefb.png > > -rw-r--r-- 1 1001890000 1001890000 0 Nov 3 12:23 > > upload_fce4f55348185fefa0abdd8d11095ba8.gif > > -rw-r--r-- 1 1001890000 1001890000 0 Nov 3 11:54 > > upload_fd95b8358851f0dac22fb775046a6163.png > > ... > > > > The user claims that those files were non-zero sized last week. The > > sequence of zero sized files includes *all* files written between Nov > > 2 and 9. > > The user claims that his client was running out of memory, but this is > > now fixed. So I suspect that his ceph client (kernel > > 3.10.0-1127.19.1.el7.x86_64) was not behaving well. > > > > Anyway, I noticed that even though the dentries list 0 bytes, the > > underlying rados objects have data, and the data looks good. E.g. > > > > # rados get -p cephfs_data 200212e68b5.00000000 --namespace=xxx > > 200212e68b5.00000000 > > # file 200212e68b5.00000000 > > 200212e68b5.00000000: PNG image data, 960 x 815, 8-bit/color RGBA, > > non-interlaced > > > > So I managed to recover the files doing something like this (using an > > input file mapping inode to filename) [see PS 0]. > > > > But I'm wondering if a forward scrub is able to fix this sort of > > problem directly? > > Someday perhaps but not yet. But it's not clear this is something the > MDS should repair. The client clearly didn't flush the dirty size to > the MDS yet. This is one of those situations where the client has done > write() but not yet fsync(), logically. > The root cause in this case turned out to be https://bugzilla.redhat.com/show_bug.cgi?id=1710751 We haven't seen this again after updating client kernels. Best Regards, Dan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx