I've created a tracker https://tracker.ceph.com/issues/54557 to track this issue. Thanks Chris, for bringing this to my attention. Regards, Milind On Sun, Mar 13, 2022 at 1:11 AM Chris Palmer <chris.palmer@xxxxxxxxx> wrote: > Hi Miland (or anyone else who can help...) > > Reading this thread made me realise I had overlooked cephfs scrubbing, so > i tried it on a small 16.2.7 cluster. The normal forward scrub showed > nothing. However "ceph tell mds.0 scrub start ~mdsdir recursive" did find > one backtrace error (putting the cluster into HEALTH_ERR). I then did a > repair which according to the log did rewrite the inode, and subsequent > scrubs have not found it. > > However the cluster health is still ERR, and the MDS still shows the > damage: > > ceph@xxxx1:~$ ceph tell mds.0 damage ls > 2022-03-12T18:42:01.609+0000 7f1b817fa700 0 client.173985213 ms_handle_reset on v2:192.168.80.121:6824/939134894 > 2022-03-12T18:42:01.625+0000 <http://192.168.80.121:6824/9391348942022-03-12T18:42:01.625+0000> 7f1b817fa700 0 client.173985219 ms_handle_reset on v2:192.168.80.121:6824/939134894 > [ > { > "damage_type": "backtrace", > "id": 3308827822, > "ino": 256, > "path": "~mds0" > } > ] > > What are the right steps from here? Has the error actually been corrected > but just needs clearing or is it still there? > > In case it is relevant: there is one active and two standby MDS. The log > is from the node currently hosting rank 0. > > From the mds log: > > 2022-03-12T18:13:41.593+0000 7f61d30c1700 1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive]} (starting...) > 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0 > 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] > 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0] > 2022-03-12T18:13:41.601+0000 7f61cb0b1700 0 log_channel(cluster) log [WRN] : Scrub error on inode 0x100 (~mds0) see mds.xxxx1 log and `damage ls` output for details > 2022-03-12T18:13:41.601+0000 7f61cb0b1700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x100 [...2,head] ~mds0/ auth v6798 ap=1 snaprealm=0x55d59548 > 4800 f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest lock) (iversion lock) | dirtysca > ttered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=1 scrubqueue=0 0x55d595486000]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked" > :true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed to read off disk; see retval"},"raw_stats":{"ch > ecked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","mem > ory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","error_str":""},"return_code":-61} > 2022-03-12T18:13:41.601+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] > 2022-03-12T18:13:45.317+0000 7f61cf8ba700 0 log_channel(cluster) log [INF] : scrub summary: idle > > 2022-03-12T18:13:52.881+0000 7f61d30c1700 1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive,repair]} (starting...) > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0 > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0] > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [WRN] : bad backtrace on inode 0x100(~mds0), rewriting it > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : Scrub repaired inode 0x100 (~mds0) > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x100 [...2,head] ~mds0/ auth v6798 ap=1 snaprealm=0x55d595484800 DIRTYPARENT f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 openingsnapparents=0 dirtyparent=1 dirty=1 authpin=1 scrubqueue=0 0x55d595486000]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed to read off disk; see retval"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","memory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","error_str":""},"return_code":-61} > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] > 2022-03-12T18:13:55.317+0000 7f61cf8ba700 0 log_channel(cluster) log [INF] : scrub summary: idle > > 2022-03-12T18:14:12.608+0000 7f61d30c1700 1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive,repair]} (starting...) > 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0 > 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] > 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0] > 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] > 2022-03-12T18:14:15.316+0000 7f61cf8ba700 0 log_channel(cluster) log [INF] : scrub summary: idle > > > Thanks, Chris > > > On 11/03/2022 12:24, Milind Changire wrote: > > Here's some answers to your questions: > > On Sun, Mar 6, 2022 at 3:57 AM Arnaud M <arnaud.meauzoone@xxxxxxxxx> <arnaud.meauzoone@xxxxxxxxx> wrote: > > > Hello to everyone :) > > Just some question about filesystem scrubbing > > In this documentation it is said that scrub will help admin check > consistency of filesystem: > https://docs.ceph.com/en/latest/cephfs/scrub/ > > So my questions are: > > Is filesystem scrubbing mandatory ? > How often should I scrub the whole filesystem (ie start at /) > How often should I scrub ~mdsdir > Should I set up a cronjob ? > Is filesystem scrubbing considerated armless ? Even with recursive force > repair ? > Is there any chance for scrubbing to overload mds on a big file system (ie > like find . -ls) ? > What is the difference between "recursive repair" and "recursive force > repair" ? Is "force" armless ? > Is there any way to see at which file/folder is the scrub operation ? In > fact any better way to see srub progress than "scrub status" which doesn't > say much > > Sorry for all the questions, but there is not that much documentation about > filesystem scrubbing. And I do think the answers will help a lot of cephfs > administrators :) > > Thanks to all > > All the best > > Arnaud > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > 1. > > Is filesystem scrubbing mandatory ? > As a routine system administration practice, it is good to ensure that > your file-system is always in a good state. To avoid getting the > file-system into a bottleneck state during work hours, it's always a good > idea to reserve some time to run a recursive forward scrub and use the > in-built scrub automation to fix such issues. Although you can run the > scrub at any directory of your choice, it's always a good practice to start > the scrub at the file-system root once in a while. > > So file-system scrubbing is not mandatory but highly recommended. > > Filesystem scrubbing is designed to read CephFS’ metadata and detect > inconsistencies or issues that are generated by bitrot or bugs, just as > RADOS’ pg scrubbing is. In a perfect world without bugs or bit flips it > would be unnecessary, but we don’t live in that world — so a scrub can > detect small issues before they turn into big ones, and the mere act of > reading data can keep it fresh and give storage devices a chance to correct > any media errors while that’s still possible. > > We don’t have a specific recommended schedule and scrub takes up cluster IO > and compute resources so its frequency should be tailored to your workload. > > > 1. > > How often should I scrub the whole filesystem (ie start at /) > Since you'd always want to have a consistent file-system, it would good > to run scrubbing: > 1. > > before taking a snapshot of the entire file-system OR > 2. > > before taking a backup of the entire file-system OR > 3. > > after significant metadata activity eg. after creating files, > renaming files, deleting files, changing file attributes, etc. > > > There's no one-rule-fixes-all scenario. So, you'll need to follow a > heuristic approach. The type of devices (HDD or SSD), the amount of > activity wearing the device are the typical factors involved when deciding > to scrub a file-system. If you have some window dedicated for backup > activity, then you’d want to run a recursive forward scrub with repair on > the entire file-system before it is snapshotted and used for backup. > Although you can run a scrub along with active use of the file-system, it > is always recommended that you run the scrub on a quiet file-system so that > neither of the activities get in each other’s way. This also helps in > completing the scrub task quicker. > > > 1. > > How often should I scrub ~mdsdir ? > ~mdsdir is used to collect deleted (stray) entries. So, the number of > file/dir unlinks in a typical workload should be used to come up with a > heuristic to scrub the file-system. This activity can be taken up > separately from scrubbing the file-system root. > > > > 1. > > Should I set up a cron job ? > > Yes, you could. > > > 1. > > Is filesystem scrubbing considered harmless ? Even with recursive force > repair ? > > Yes, scrubbing even with repair is harmless. > > Scrubbing with repair does the following things: > > 1. > > Repair backtrace > If on-disk and in-memory backtraces don't match, then the DIRTYPARENT > flag is set so that the journal logger thread picks the inode for writing > the backtrace to the disk. > 2. > > Repair inode > If on-disk and in-memory inode versions don't match, then the inode is > left untouched. Otherwise, if the inode is marked as "free", the inode > number is removed from active use. > 3. > > Repair recursive-stats > If on-disk and in-memory raw-stats don't match, then all the stats for > the leaves in the directory tree are marked dirty and a scatter-gather > operation is forced to coalesce raw-stats info. > > > > 1. > > Is there any chance for scrubbing to overload mds on a big file system > ie. like find . -ls ? > Scrubbing on its own should not be able to overload an MDS, but it is an > additional load on top of whatever client activity the MDS is serving, > which could exceed the server’s capacity. To put it in short, yes, it might > overload the mds when done in sustained high I/O scenarios. > The mds config option mds_max_scrub_ops_in_progress, which defaults to > 5, decides the number of scrubs running at any given time. So, there is a > small effort at throttling. > > > > 1. > > What is the difference between "recursive repair" and "recursive force > repair" ? Is "force" harmless ? > If “force” argument is specified, then a dirfrag is scrubbed only if > 1. > > The dentry version is greater than last scrub version AND > 2. > > The dentry type is a DIR > > If “force” is not specified, then dirfrag scrubbing is skipped. You will be > able to see an mds log saying that the scrubbing is skipped for the dentry. > > The rest of the scrubbing is done as described in Q5 above. > > > 1. > > Is there any way to see at which file/folder is the scrub operation ? In > fact any better way to see scrub progress than "scrub status" which doesn't > say much. > Currently there's no way to see which file/folder is being scrubbed. At > most we could log a line in the mds logs about it, but it could soon cause > logs to bloat if the number of entries are large. > > > > > > > -- Milind _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx