Chris, After you run "scrub repair" followed by a "scrub" without any issues, and if the "damage ls" still shows you an error, try running "damage rm" and re-run "scrub" to see if the system still reports a damage. Please update the upstream tracker with your findings if possible. -- Milind On Sun, Mar 13, 2022 at 2:41 AM Chris Palmer <chris.palmer@xxxxxxxxx> wrote: > Ok, restarting mds.0 cleared it. I then restarted the others until this > one was again active, and repeated the scrub ~mdsdir which was then clean. > > I don't know what caused it, or why restarting the MDS was necessary but > it has done the trick. > > On 12/03/2022 19:14, Chris Palmer wrote: > > Hi Miland (or anyone else who can help...) > > > > Reading this thread made me realise I had overlooked cephfs scrubbing, > > so i tried it on a small 16.2.7 cluster. The normal forward scrub > > showed nothing. However "ceph tell mds.0 scrub start ~mdsdir > > recursive" did find one backtrace error (putting the cluster into > > HEALTH_ERR). I then did a repair which according to the log did > > rewrite the inode, and subsequent scrubs have not found it. > > > > However the cluster health is still ERR, and the MDS still shows the > > damage: > > > > ceph@xxxx1:~$ ceph tell mds.0 damage ls > > 2022-03-12T18:42:01.609+0000 7f1b817fa700 0 client.173985213 > > ms_handle_reset on v2:192.168.80.121:6824/939134894 > > 2022-03-12T18:42:01.625+0000 7f1b817fa700 0 client.173985219 > > ms_handle_reset on v2:192.168.80.121:6824/939134894 > > [ > > { > > "damage_type": "backtrace", > > "id": 3308827822, > > "ino": 256, > > "path": "~mds0" > > } > > ] > > > > What are the right steps from here? Has the error actually been > > corrected but just needs clearing or is it still there? > > > > In case it is relevant: there is one active and two standby MDS. The > > log is from the node currently hosting rank 0. > > > > From the mds log: > > > > 2022-03-12T18:13:41.593+0000 7f61d30c1700 1 mds.xxxx1 asok_command: > > scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive]} > > (starting...) > > 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub queued for path: ~mds0 > > 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub summary: idle+waiting paths [~mds0] > > 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub summary: active paths [~mds0] > > 2022-03-12T18:13:41.601+0000 7f61cb0b1700 0 log_channel(cluster) log > > [WRN] : Scrub error on inode 0x100 (~mds0) see mds.xxxx1 log and > > `damage ls` output for details > > 2022-03-12T18:13:41.601+0000 7f61cb0b1700 -1 mds.0.scrubstack > > _validate_inode_done scrub error on inode [inode 0x100 [...2,head] > > ~mds0/ auth v6798 ap=1 snaprealm=0x55d59548 > > 4800 f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 > > b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 > > 11=0+11) (inest lock) (iversion lock) | dirtysca > > ttered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=1 > > scrubqueue=0 0x55d595486000]: > > > {"performed_validation":true,"passed_validation":false,"backtrace":{"checked" > > > :true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed > > > to read off disk; see retval"},"raw_stats":{"ch > > ecked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 > > 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000 > > b1017620718 375=364+11)","mem > > ory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815 > > rc2022-03-12T16:01:44.218294+0000 b1017620718 > > 375=364+11)","error_str":""},"return_code":-61} > > 2022-03-12T18:13:41.601+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub summary: idle+waiting paths [~mds0] > > 2022-03-12T18:13:45.317+0000 7f61cf8ba700 0 log_channel(cluster) log > > [INF] : scrub summary: idle > > > > 2022-03-12T18:13:52.881+0000 7f61d30c1700 1 mds.xxxx1 asok_command: > > scrub start {path=~mdsdir,prefix=scrub > > start,scrubops=[recursive,repair]} (starting...) > > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub queued for path: ~mds0 > > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub summary: idle+waiting paths [~mds0] > > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub summary: active paths [~mds0] > > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log > > [WRN] : bad backtrace on inode 0x100(~mds0), rewriting it > > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : Scrub repaired inode 0x100 (~mds0) > > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 -1 mds.0.scrubstack > > _validate_inode_done scrub error on inode [inode 0x100 [...2,head] > > ~mds0/ auth v6798 ap=1 snaprealm=0x55d595484800 DIRTYPARENT f(v0 > > 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 > > 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest > > lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 > > openingsnapparents=0 dirtyparent=1 dirty=1 authpin=1 scrubqueue=0 > > 0x55d595486000]: > > > {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed > > > to read off disk; see > > > retval"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 > > > 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000 > > b1017620718 375=364+11)","memory_value.dirstat":"f(v0 > > 10=0+10)","memory_value.rstat":"n(v1815 > > rc2022-03-12T16:01:44.218294+0000 b1017620718 > > 375=364+11)","error_str":""},"return_code":-61} > > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub summary: idle+waiting paths [~mds0] > > 2022-03-12T18:13:55.317+0000 7f61cf8ba700 0 log_channel(cluster) log > > [INF] : scrub summary: idle > > > > 2022-03-12T18:14:12.608+0000 7f61d30c1700 1 mds.xxxx1 asok_command: > > scrub start {path=~mdsdir,prefix=scrub > > start,scrubops=[recursive,repair]} (starting...) > > 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub queued for path: ~mds0 > > 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub summary: idle+waiting paths [~mds0] > > 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub summary: active paths [~mds0] > > 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log > > [INF] : scrub summary: idle+waiting paths [~mds0] > > 2022-03-12T18:14:15.316+0000 7f61cf8ba700 0 log_channel(cluster) log > > [INF] : scrub summary: idle > > > > > > Thanks, Chris > > > > > > On 11/03/2022 12:24, Milind Changire wrote: > >> Here's some answers to your questions: > >> > >> On Sun, Mar 6, 2022 at 3:57 AM Arnaud M<arnaud.meauzoone@xxxxxxxxx> > >> wrote: > >> > >>> Hello to everyone :) > >>> > >>> Just some question about filesystem scrubbing > >>> > >>> In this documentation it is said that scrub will help admin check > >>> consistency of filesystem: > >>> > >>> https://docs.ceph.com/en/latest/cephfs/scrub/ > >>> > >>> So my questions are: > >>> > >>> Is filesystem scrubbing mandatory ? > >>> How often should I scrub the whole filesystem (ie start at /) > >>> How often should I scrub ~mdsdir > >>> Should I set up a cronjob ? > >>> Is filesystem scrubbing considerated armless ? Even with recursive > >>> force > >>> repair ? > >>> Is there any chance for scrubbing to overload mds on a big file > >>> system (ie > >>> like find . -ls) ? > >>> What is the difference between "recursive repair" and "recursive force > >>> repair" ? Is "force" armless ? > >>> Is there any way to see at which file/folder is the scrub operation > >>> ? In > >>> fact any better way to see srub progress than "scrub status" which > >>> doesn't > >>> say much > >>> > >>> Sorry for all the questions, but there is not that much > >>> documentation about > >>> filesystem scrubbing. And I do think the answers will help a lot of > >>> cephfs > >>> administrators :) > >>> > >>> Thanks to all > >>> > >>> All the best > >>> > >>> Arnaud > >>> _______________________________________________ > >>> ceph-users mailing list --ceph-users@xxxxxxx > >>> To unsubscribe send an email toceph-users-leave@xxxxxxx > >>> > >>> > >> 1. > >> > >> Is filesystem scrubbing mandatory ? > >> As a routine system administration practice, it is good to ensure > >> that > >> your file-system is always in a good state. To avoid getting the > >> file-system into a bottleneck state during work hours, it's > >> always a good > >> idea to reserve some time to run a recursive forward scrub and > >> use the > >> in-built scrub automation to fix such issues. Although you can > >> run the > >> scrub at any directory of your choice, it's always a good > >> practice to start > >> the scrub at the file-system root once in a while. > >> > >> So file-system scrubbing is not mandatory but highly recommended. > >> > >> Filesystem scrubbing is designed to read CephFS’ metadata and detect > >> inconsistencies or issues that are generated by bitrot or bugs, just as > >> RADOS’ pg scrubbing is. In a perfect world without bugs or bit flips it > >> would be unnecessary, but we don’t live in that world — so a scrub can > >> detect small issues before they turn into big ones, and the mere act of > >> reading data can keep it fresh and give storage devices a chance to > >> correct > >> any media errors while that’s still possible. > >> > >> We don’t have a specific recommended schedule and scrub takes up > >> cluster IO > >> and compute resources so its frequency should be tailored to your > >> workload. > >> > >> > >> 1. > >> > >> How often should I scrub the whole filesystem (ie start at /) > >> Since you'd always want to have a consistent file-system, it > >> would good > >> to run scrubbing: > >> 1. > >> > >> before taking a snapshot of the entire file-system OR > >> 2. > >> > >> before taking a backup of the entire file-system OR > >> 3. > >> > >> after significant metadata activity eg. after creating files, > >> renaming files, deleting files, changing file attributes, etc. > >> > >> > >> There's no one-rule-fixes-all scenario. So, you'll need to follow a > >> heuristic approach. The type of devices (HDD or SSD), the amount of > >> activity wearing the device are the typical factors involved when > >> deciding > >> to scrub a file-system. If you have some window dedicated for backup > >> activity, then you’d want to run a recursive forward scrub with > >> repair on > >> the entire file-system before it is snapshotted and used for backup. > >> Although you can run a scrub along with active use of the > >> file-system, it > >> is always recommended that you run the scrub on a quiet file-system > >> so that > >> neither of the activities get in each other’s way. This also helps in > >> completing the scrub task quicker. > >> > >> > >> 1. > >> > >> How often should I scrub ~mdsdir ? > >> ~mdsdir is used to collect deleted (stray) entries. So, the > >> number of > >> file/dir unlinks in a typical workload should be used to come up > >> with a > >> heuristic to scrub the file-system. This activity can be taken up > >> separately from scrubbing the file-system root. > >> > >> > >> > >> 1. > >> > >> Should I set up a cron job ? > >> > >> Yes, you could. > >> > >> > >> 1. > >> > >> Is filesystem scrubbing considered harmless ? Even with recursive > >> force > >> repair ? > >> > >> Yes, scrubbing even with repair is harmless. > >> > >> Scrubbing with repair does the following things: > >> > >> 1. > >> > >> Repair backtrace > >> If on-disk and in-memory backtraces don't match, then the > >> DIRTYPARENT > >> flag is set so that the journal logger thread picks the inode for > >> writing > >> the backtrace to the disk. > >> 2. > >> > >> Repair inode > >> If on-disk and in-memory inode versions don't match, then the > >> inode is > >> left untouched. Otherwise, if the inode is marked as "free", the > >> inode > >> number is removed from active use. > >> 3. > >> > >> Repair recursive-stats > >> If on-disk and in-memory raw-stats don't match, then all the > >> stats for > >> the leaves in the directory tree are marked dirty and a > >> scatter-gather > >> operation is forced to coalesce raw-stats info. > >> > >> > >> > >> 1. > >> > >> Is there any chance for scrubbing to overload mds on a big file > >> system > >> ie. like find . -ls ? > >> Scrubbing on its own should not be able to overload an MDS, but > >> it is an > >> additional load on top of whatever client activity the MDS is > >> serving, > >> which could exceed the server’s capacity. To put it in short, > >> yes, it might > >> overload the mds when done in sustained high I/O scenarios. > >> The mds config option mds_max_scrub_ops_in_progress, which > >> defaults to > >> 5, decides the number of scrubs running at any given time. So, > >> there is a > >> small effort at throttling. > >> > >> > >> > >> 1. > >> > >> What is the difference between "recursive repair" and "recursive > >> force > >> repair" ? Is "force" harmless ? > >> If “force” argument is specified, then a dirfrag is scrubbed only if > >> 1. > >> > >> The dentry version is greater than last scrub version AND > >> 2. > >> > >> The dentry type is a DIR > >> > >> If “force” is not specified, then dirfrag scrubbing is skipped. You > >> will be > >> able to see an mds log saying that the scrubbing is skipped for the > >> dentry. > >> > >> The rest of the scrubbing is done as described in Q5 above. > >> > >> > >> 1. > >> > >> Is there any way to see at which file/folder is the scrub > >> operation ? In > >> fact any better way to see scrub progress than "scrub status" > >> which doesn't > >> say much. > >> Currently there's no way to see which file/folder is being > >> scrubbed. At > >> most we could log a line in the mds logs about it, but it could > >> soon cause > >> logs to bloat if the number of entries are large. > >> > >> > >> > >> > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > -- Milind _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx