Re: How often should I scrub the filesystem ?

Milind Changire <mchangir@xxxxxxxxxx> · Thu, 17 Mar 2022 11:45:25 +0530

Chris,
After you run "scrub repair" followed by a "scrub" without any issues, and
if the "damage ls" still shows you an error, try running "damage rm" and
re-run "scrub" to see if the system still reports a damage.

Please update the upstream tracker with your findings if possible.
--
Milind

On Sun, Mar 13, 2022 at 2:41 AM Chris Palmer <chris.palmer@xxxxxxxxx> wrote:

> Ok, restarting mds.0 cleared it. I then restarted the others until this
> one was again active, and repeated the scrub ~mdsdir which was then clean.
>
> I don't know what caused it, or why restarting the MDS was necessary but
> it has done the trick.
>
> On 12/03/2022 19:14, Chris Palmer wrote:
> > Hi Miland (or anyone else who can help...)
> >
> > Reading this thread made me realise I had overlooked cephfs scrubbing,
> > so i tried it on a small 16.2.7 cluster. The normal forward scrub
> > showed nothing. However "ceph tell mds.0 scrub start ~mdsdir
> > recursive" did find one backtrace error (putting the cluster into
> > HEALTH_ERR). I then did a repair which according to the log did
> > rewrite the inode, and subsequent scrubs have not found it.
> >
> > However the cluster health is still ERR, and the MDS still shows the
> > damage:
> >
> > ceph@xxxx1:~$ ceph tell mds.0 damage ls
> > 2022-03-12T18:42:01.609+0000 7f1b817fa700  0 client.173985213
> > ms_handle_reset on v2:192.168.80.121:6824/939134894
> > 2022-03-12T18:42:01.625+0000 7f1b817fa700  0 client.173985219
> > ms_handle_reset on v2:192.168.80.121:6824/939134894
> > [
> >     {
> >         "damage_type": "backtrace",
> >         "id": 3308827822,
> >         "ino": 256,
> >         "path": "~mds0"
> >     }
> > ]
> >
> > What are the right steps from here? Has the error actually been
> > corrected but just needs clearing or is it still there?
> >
> > In case it is relevant: there is one active and two standby MDS. The
> > log is from the node currently hosting rank 0.
> >
> > From the mds log:
> >
> > 2022-03-12T18:13:41.593+0000 7f61d30c1700  1 mds.xxxx1 asok_command:
> > scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive]}
> > (starting...)
> > 2022-03-12T18:13:41.593+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub queued for path: ~mds0
> > 2022-03-12T18:13:41.593+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub summary: idle+waiting paths [~mds0]
> > 2022-03-12T18:13:41.593+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub summary: active paths [~mds0]
> > 2022-03-12T18:13:41.601+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [WRN] : Scrub error on inode 0x100 (~mds0) see mds.xxxx1 log and
> > `damage ls` output for details
> > 2022-03-12T18:13:41.601+0000 7f61cb0b1700 -1 mds.0.scrubstack
> > _validate_inode_done scrub error on inode [inode 0x100 [...2,head]
> > ~mds0/ auth v6798 ap=1 snaprealm=0x55d59548
> > 4800 f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000
> > b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000
> > 11=0+11) (inest lock) (iversion lock) | dirtysca
> > ttered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=1
> > scrubqueue=0 0x55d595486000]:
> >
> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked"
> >
> :true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed
>
> > to read off disk; see retval"},"raw_stats":{"ch
> > ecked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0
> > 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000
> > b1017620718 375=364+11)","mem
> > ory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815
> > rc2022-03-12T16:01:44.218294+0000 b1017620718
> > 375=364+11)","error_str":""},"return_code":-61}
> > 2022-03-12T18:13:41.601+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub summary: idle+waiting paths [~mds0]
> > 2022-03-12T18:13:45.317+0000 7f61cf8ba700  0 log_channel(cluster) log
> > [INF] : scrub summary: idle
> >
> > 2022-03-12T18:13:52.881+0000 7f61d30c1700  1 mds.xxxx1 asok_command:
> > scrub start {path=~mdsdir,prefix=scrub
> > start,scrubops=[recursive,repair]} (starting...)
> > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub queued for path: ~mds0
> > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub summary: idle+waiting paths [~mds0]
> > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub summary: active paths [~mds0]
> > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [WRN] : bad backtrace on inode 0x100(~mds0), rewriting it
> > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : Scrub repaired inode 0x100 (~mds0)
> > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 -1 mds.0.scrubstack
> > _validate_inode_done scrub error on inode [inode 0x100 [...2,head]
> > ~mds0/ auth v6798 ap=1 snaprealm=0x55d595484800 DIRTYPARENT f(v0
> > 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718
> > 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest
> > lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1
> > openingsnapparents=0 dirtyparent=1 dirty=1 authpin=1 scrubqueue=0
> > 0x55d595486000]:
> >
> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed
>
> > to read off disk; see
> >
> retval"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0
>
> > 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000
> > b1017620718 375=364+11)","memory_value.dirstat":"f(v0
> > 10=0+10)","memory_value.rstat":"n(v1815
> > rc2022-03-12T16:01:44.218294+0000 b1017620718
> > 375=364+11)","error_str":""},"return_code":-61}
> > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub summary: idle+waiting paths [~mds0]
> > 2022-03-12T18:13:55.317+0000 7f61cf8ba700  0 log_channel(cluster) log
> > [INF] : scrub summary: idle
> >
> > 2022-03-12T18:14:12.608+0000 7f61d30c1700  1 mds.xxxx1 asok_command:
> > scrub start {path=~mdsdir,prefix=scrub
> > start,scrubops=[recursive,repair]} (starting...)
> > 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub queued for path: ~mds0
> > 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub summary: idle+waiting paths [~mds0]
> > 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub summary: active paths [~mds0]
> > 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log
> > [INF] : scrub summary: idle+waiting paths [~mds0]
> > 2022-03-12T18:14:15.316+0000 7f61cf8ba700  0 log_channel(cluster) log
> > [INF] : scrub summary: idle
> >
> >
> > Thanks, Chris
> >
> >
> > On 11/03/2022 12:24, Milind Changire wrote:
> >> Here's some answers to your questions:
> >>
> >> On Sun, Mar 6, 2022 at 3:57 AM Arnaud M<arnaud.meauzoone@xxxxxxxxx>
> >> wrote:
> >>
> >>> Hello to everyone :)
> >>>
> >>> Just some question about filesystem scrubbing
> >>>
> >>> In this documentation it is said that scrub will help admin check
> >>> consistency of filesystem:
> >>>
> >>> https://docs.ceph.com/en/latest/cephfs/scrub/
> >>>
> >>> So my questions are:
> >>>
> >>> Is filesystem scrubbing mandatory ?
> >>> How often should I scrub the whole filesystem (ie start at /)
> >>> How often should I scrub ~mdsdir
> >>> Should I set up a cronjob ?
> >>> Is filesystem scrubbing considerated armless ? Even with recursive
> >>> force
> >>> repair ?
> >>> Is there any chance for scrubbing to overload mds on a big file
> >>> system (ie
> >>> like find . -ls) ?
> >>> What is the difference between "recursive repair" and "recursive force
> >>> repair" ? Is "force" armless ?
> >>> Is there any way to see at which file/folder is the scrub operation
> >>> ? In
> >>> fact any better way to see srub progress than "scrub status" which
> >>> doesn't
> >>> say much
> >>>
> >>> Sorry for all the questions, but there is not that much
> >>> documentation about
> >>> filesystem scrubbing. And I do think the answers will help a lot of
> >>> cephfs
> >>> administrators :)
> >>>
> >>> Thanks to all
> >>>
> >>> All the best
> >>>
> >>> Arnaud
> >>> _______________________________________________
> >>> ceph-users mailing list --ceph-users@xxxxxxx
> >>> To unsubscribe send an email toceph-users-leave@xxxxxxx
> >>>
> >>>
> >>     1.
> >>
> >>     Is filesystem scrubbing mandatory ?
> >>     As a routine system administration practice, it is good to ensure
> >> that
> >>     your file-system is always in a good state. To avoid getting the
> >>     file-system into a bottleneck state during work hours, it's
> >> always a good
> >>     idea to reserve some time to run a recursive forward scrub and
> >> use the
> >>     in-built scrub automation to fix such issues. Although you can
> >> run the
> >>     scrub at any directory of your choice, it's always a good
> >> practice to start
> >>     the scrub at the file-system root once in a while.
> >>
> >> So file-system scrubbing is not mandatory but highly recommended.
> >>
> >> Filesystem scrubbing is designed to read CephFS’ metadata and detect
> >> inconsistencies or issues that are generated by bitrot or bugs, just as
> >> RADOS’ pg scrubbing is. In a perfect world without bugs or bit flips it
> >> would be unnecessary, but we don’t live in that world — so a scrub can
> >> detect small issues before they turn into big ones, and the mere act of
> >> reading data can keep it fresh and give storage devices a chance to
> >> correct
> >> any media errors while that’s still possible.
> >>
> >> We don’t have a specific recommended schedule and scrub takes up
> >> cluster IO
> >> and compute resources so its frequency should be tailored to your
> >> workload.
> >>
> >>
> >>     1.
> >>
> >>     How often should I scrub the whole filesystem (ie start at /)
> >>     Since you'd always want to have a consistent file-system, it
> >> would good
> >>     to run scrubbing:
> >>     1.
> >>
> >>        before taking a snapshot of the entire file-system OR
> >>        2.
> >>
> >>        before taking a backup of the entire file-system OR
> >>        3.
> >>
> >>        after significant metadata activity eg. after creating files,
> >>        renaming files, deleting files, changing file attributes, etc.
> >>
> >>
> >> There's no one-rule-fixes-all scenario. So, you'll need to follow a
> >> heuristic approach. The type of devices (HDD or SSD), the amount of
> >> activity wearing the device are the typical factors involved when
> >> deciding
> >> to scrub a file-system. If you have some window dedicated for backup
> >> activity, then you’d want to run a recursive forward scrub with
> >> repair on
> >> the entire file-system before it is snapshotted and used for backup.
> >> Although you can run a scrub along with active use of the
> >> file-system, it
> >> is always recommended that you run the scrub on a quiet file-system
> >> so that
> >> neither of the activities get in each other’s way. This also helps in
> >> completing the scrub task quicker.
> >>
> >>
> >>     1.
> >>
> >>     How often should I scrub ~mdsdir ?
> >>     ~mdsdir is used to collect deleted (stray) entries. So, the
> >> number of
> >>     file/dir unlinks in a typical workload should be used to come up
> >> with a
> >>     heuristic to scrub the file-system. This activity can be taken up
> >>     separately from scrubbing the file-system root.
> >>
> >>
> >>
> >>     1.
> >>
> >>     Should I set up a cron job ?
> >>
> >> Yes, you could.
> >>
> >>
> >>     1.
> >>
> >>     Is filesystem scrubbing considered harmless ? Even with recursive
> >> force
> >>     repair ?
> >>
> >> Yes, scrubbing even with repair is harmless.
> >>
> >> Scrubbing with repair does the following things:
> >>
> >>     1.
> >>
> >>     Repair backtrace
> >>     If on-disk and in-memory backtraces don't match, then the
> >> DIRTYPARENT
> >>     flag is set so that the journal logger thread picks the inode for
> >> writing
> >>     the backtrace to the disk.
> >>     2.
> >>
> >>     Repair inode
> >>     If on-disk and in-memory inode versions don't match, then the
> >> inode is
> >>     left untouched. Otherwise, if the inode is marked as "free", the
> >> inode
> >>     number is removed from active use.
> >>     3.
> >>
> >>     Repair recursive-stats
> >>     If on-disk and in-memory raw-stats don't match, then all the
> >> stats for
> >>     the leaves in the directory tree are marked dirty and a
> >> scatter-gather
> >>     operation is forced to coalesce raw-stats info.
> >>
> >>
> >>
> >>     1.
> >>
> >>     Is there any chance for scrubbing to overload mds on a big file
> >> system
> >>     ie. like find . -ls ?
> >>     Scrubbing on its own should not be able to overload an MDS, but
> >> it is an
> >>     additional load on top of whatever client activity the MDS is
> >> serving,
> >>     which could exceed the server’s capacity. To put it in short,
> >> yes, it might
> >>     overload the mds when done in sustained high I/O scenarios.
> >>     The mds config option mds_max_scrub_ops_in_progress, which
> >> defaults to
> >>     5, decides the number of scrubs running at any given time. So,
> >> there is a
> >>     small effort at throttling.
> >>
> >>
> >>
> >>     1.
> >>
> >>     What is the difference between "recursive repair" and "recursive
> >> force
> >>     repair" ? Is "force" harmless ?
> >>     If “force” argument is specified, then a dirfrag is scrubbed only if
> >>     1.
> >>
> >>        The dentry version is greater than last scrub version AND
> >>        2.
> >>
> >>        The dentry type is a DIR
> >>
> >> If “force” is not specified, then dirfrag scrubbing is skipped. You
> >> will be
> >> able to see an mds log saying that the scrubbing is skipped for the
> >> dentry.
> >>
> >> The rest of the scrubbing is done as described in Q5 above.
> >>
> >>
> >>     1.
> >>
> >>     Is there any way to see at which file/folder is the scrub
> >> operation ? In
> >>     fact any better way to see scrub progress than "scrub status"
> >> which doesn't
> >>     say much.
> >>     Currently there's no way to see which file/folder is being
> >> scrubbed. At
> >>     most we could log a line in the mds logs about it, but it could
> >> soon cause
> >>     logs to bloat if the number of entries are large.
> >>
> >>
> >>
> >>
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>

-- 
Milind
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx