Re: How often should I scrub the filesystem ?

Milind Changire <mchangir@xxxxxxxxxx> · Tue, 15 Mar 2022 08:27:43 +0530

I've created a tracker https://tracker.ceph.com/issues/54557 to track this
issue.
Thanks Chris, for bringing this to my attention.

Regards,
Milind

On Sun, Mar 13, 2022 at 1:11 AM Chris Palmer <chris.palmer@xxxxxxxxx> wrote:

> Hi Miland (or anyone else who can help...)
>
> Reading this thread made me realise I had overlooked cephfs scrubbing, so
> i tried it on a small 16.2.7 cluster. The normal forward scrub showed
> nothing. However "ceph tell mds.0 scrub start ~mdsdir recursive" did find
> one backtrace error (putting the cluster into HEALTH_ERR). I then did a
> repair which according to the log did rewrite the inode, and subsequent
> scrubs have not found it.
>
> However the cluster health is still ERR, and the MDS still shows the
> damage:
>
> ceph@xxxx1:~$ ceph tell mds.0 damage ls
> 2022-03-12T18:42:01.609+0000 7f1b817fa700  0 client.173985213 ms_handle_reset on v2:192.168.80.121:6824/939134894
> 2022-03-12T18:42:01.625+0000 <http://192.168.80.121:6824/9391348942022-03-12T18:42:01.625+0000> 7f1b817fa700  0 client.173985219 ms_handle_reset on v2:192.168.80.121:6824/939134894
> [
>     {
>         "damage_type": "backtrace",
>         "id": 3308827822,
>         "ino": 256,
>         "path": "~mds0"
>     }
> ]
>
> What are the right steps from here? Has the error actually been corrected
> but just needs clearing or is it still there?
>
> In case it is relevant: there is one active and two standby MDS. The log
> is from the node currently hosting rank 0.
>
> From the mds log:
>
> 2022-03-12T18:13:41.593+0000 7f61d30c1700  1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive]} (starting...)
> 2022-03-12T18:13:41.593+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0
> 2022-03-12T18:13:41.593+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:13:41.593+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0]
> 2022-03-12T18:13:41.601+0000 7f61cb0b1700  0 log_channel(cluster) log [WRN] : Scrub error on inode 0x100 (~mds0) see mds.xxxx1 log and `damage ls` output for details
> 2022-03-12T18:13:41.601+0000 7f61cb0b1700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x100 [...2,head] ~mds0/ auth v6798 ap=1 snaprealm=0x55d59548
> 4800 f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest lock) (iversion lock) | dirtysca
> ttered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=1 scrubqueue=0 0x55d595486000]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked"
> :true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed to read off disk; see retval"},"raw_stats":{"ch
> ecked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","mem
> ory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","error_str":""},"return_code":-61}
> 2022-03-12T18:13:41.601+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:13:45.317+0000 7f61cf8ba700  0 log_channel(cluster) log [INF] : scrub summary: idle
>
> 2022-03-12T18:13:52.881+0000 7f61d30c1700  1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive,repair]} (starting...)
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0]
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [WRN] : bad backtrace on inode 0x100(~mds0), rewriting it
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : Scrub repaired inode 0x100 (~mds0)
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x100 [...2,head] ~mds0/ auth v6798 ap=1 snaprealm=0x55d595484800 DIRTYPARENT f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 openingsnapparents=0 dirtyparent=1 dirty=1 authpin=1 scrubqueue=0 0x55d595486000]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed to read off disk; see retval"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","memory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","error_str":""},"return_code":-61}
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:13:55.317+0000 7f61cf8ba700  0 log_channel(cluster) log [INF] : scrub summary: idle
>
> 2022-03-12T18:14:12.608+0000 7f61d30c1700  1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive,repair]} (starting...)
> 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0
> 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0]
> 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:14:15.316+0000 7f61cf8ba700  0 log_channel(cluster) log [INF] : scrub summary: idle
>
>
> Thanks, Chris
>
>
> On 11/03/2022 12:24, Milind Changire wrote:
>
> Here's some answers to your questions:
>
> On Sun, Mar 6, 2022 at 3:57 AM Arnaud M <arnaud.meauzoone@xxxxxxxxx> <arnaud.meauzoone@xxxxxxxxx> wrote:
>
>
> Hello to everyone :)
>
> Just some question about filesystem scrubbing
>
> In this documentation it is said that scrub will help admin check
> consistency of filesystem:
> https://docs.ceph.com/en/latest/cephfs/scrub/
>
> So my questions are:
>
> Is filesystem scrubbing mandatory ?
> How often should I scrub the whole filesystem (ie start at /)
> How often should I scrub ~mdsdir
> Should I set up a cronjob ?
> Is filesystem scrubbing considerated armless ? Even with recursive force
> repair ?
> Is there any chance for scrubbing to overload mds on a big file system (ie
> like find . -ls) ?
> What is the difference between "recursive repair" and "recursive force
> repair" ? Is "force" armless ?
> Is there any way to see at which file/folder is the scrub operation ? In
> fact any better way to see srub progress than "scrub status" which doesn't
> say much
>
> Sorry for all the questions, but there is not that much documentation about
> filesystem scrubbing. And I do think the answers will help a lot of cephfs
> administrators :)
>
> Thanks to all
>
> All the best
>
> Arnaud
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>     1.
>
>    Is filesystem scrubbing mandatory ?
>    As a routine system administration practice, it is good to ensure that
>    your file-system is always in a good state. To avoid getting the
>    file-system into a bottleneck state during work hours, it's always a good
>    idea to reserve some time to run a recursive forward scrub and use the
>    in-built scrub automation to fix such issues. Although you can run the
>    scrub at any directory of your choice, it's always a good practice to start
>    the scrub at the file-system root once in a while.
>
> So file-system scrubbing is not mandatory but highly recommended.
>
> Filesystem scrubbing is designed to read CephFS’ metadata and detect
> inconsistencies or issues that are generated by bitrot or bugs, just as
> RADOS’ pg scrubbing is. In a perfect world without bugs or bit flips it
> would be unnecessary, but we don’t live in that world — so a scrub can
> detect small issues before they turn into big ones, and the mere act of
> reading data can keep it fresh and give storage devices a chance to correct
> any media errors while that’s still possible.
>
> We don’t have a specific recommended schedule and scrub takes up cluster IO
> and compute resources so its frequency should be tailored to your workload.
>
>
>    1.
>
>    How often should I scrub the whole filesystem (ie start at /)
>    Since you'd always want to have a consistent file-system, it would good
>    to run scrubbing:
>    1.
>
>       before taking a snapshot of the entire file-system OR
>       2.
>
>       before taking a backup of the entire file-system OR
>       3.
>
>       after significant metadata activity eg. after creating files,
>       renaming files, deleting files, changing file attributes, etc.
>
>
> There's no one-rule-fixes-all scenario. So, you'll need to follow a
> heuristic approach. The type of devices (HDD or SSD), the amount of
> activity wearing the device are the typical factors involved when deciding
> to scrub a file-system. If you have some window dedicated for backup
> activity, then you’d want to run a recursive forward scrub with repair on
> the entire file-system before it is snapshotted and used for backup.
> Although you can run a scrub along with active use of the file-system, it
> is always recommended that you run the scrub on a quiet file-system so that
> neither of the activities get in each other’s way. This also helps in
> completing the scrub task quicker.
>
>
>    1.
>
>    How often should I scrub ~mdsdir ?
>    ~mdsdir is used to collect deleted (stray) entries. So, the number of
>    file/dir unlinks in a typical workload should be used to come up with a
>    heuristic to scrub the file-system. This activity can be taken up
>    separately from scrubbing the file-system root.
>
>
>
>    1.
>
>    Should I set up a cron job ?
>
> Yes, you could.
>
>
>    1.
>
>    Is filesystem scrubbing considered harmless ? Even with recursive force
>    repair ?
>
> Yes, scrubbing even with repair is harmless.
>
> Scrubbing with repair does the following things:
>
>    1.
>
>    Repair backtrace
>    If on-disk and in-memory backtraces don't match, then the DIRTYPARENT
>    flag is set so that the journal logger thread picks the inode for writing
>    the backtrace to the disk.
>    2.
>
>    Repair inode
>    If on-disk and in-memory inode versions don't match, then the inode is
>    left untouched. Otherwise, if the inode is marked as "free", the inode
>    number is removed from active use.
>    3.
>
>    Repair recursive-stats
>    If on-disk and in-memory raw-stats don't match, then all the stats for
>    the leaves in the directory tree are marked dirty and a scatter-gather
>    operation is forced to coalesce raw-stats info.
>
>
>
>    1.
>
>    Is there any chance for scrubbing to overload mds on a big file system
>    ie. like find . -ls ?
>    Scrubbing on its own should not be able to overload an MDS, but it is an
>    additional load on top of whatever client activity the MDS is serving,
>    which could exceed the server’s capacity. To put it in short, yes, it might
>    overload the mds when done in sustained high I/O scenarios.
>    The mds config option mds_max_scrub_ops_in_progress, which defaults to
>    5, decides the number of scrubs running at any given time. So, there is a
>    small effort at throttling.
>
>
>
>    1.
>
>    What is the difference between "recursive repair" and "recursive force
>    repair" ? Is "force" harmless ?
>    If “force” argument is specified, then a dirfrag is scrubbed only if
>    1.
>
>       The dentry version is greater than last scrub version AND
>       2.
>
>       The dentry type is a DIR
>
> If “force” is not specified, then dirfrag scrubbing is skipped. You will be
> able to see an mds log saying that the scrubbing is skipped for the dentry.
>
> The rest of the scrubbing is done as described in Q5 above.
>
>
>    1.
>
>    Is there any way to see at which file/folder is the scrub operation ? In
>    fact any better way to see scrub progress than "scrub status" which doesn't
>    say much.
>    Currently there's no way to see which file/folder is being scrubbed. At
>    most we could log a line in the mds logs about it, but it could soon cause
>    logs to bloat if the number of entries are large.
>
>
>
>
>
>
>

-- 
Milind
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx