Re: How often should I scrub the filesystem ?

Chris Palmer <chris.palmer@xxxxxxxxx> · Thu, 17 Mar 2022 11:53:01 +0000

Hi Miland
I've done some more tests and updated the tracker 
https://tracker.ceph.com/issues/54557
Essentially the "rm" works, but after a restart the problem reappears.
Thanks, Chris

On 17/03/2022 06:15, Milind Changire wrote:
Chris,
After you run "scrub repair" followed by a "scrub" without any issues, 
and if the "damage ls" still shows you an error, try running "damage 
rm" and re-run "scrub" to see if the system still reports a damage.

Please update the upstream tracker with your findings if possible.
--
Milind

On Sun, Mar 13, 2022 at 2:41 AM Chris Palmer <chris.palmer@xxxxxxxxx> 
wrote:

    Ok, restarting mds.0 cleared it. I then restarted the others until
    this
    one was again active, and repeated the scrub ~mdsdir which was
    then clean.

    I don't know what caused it, or why restarting the MDS was
    necessary but
    it has done the trick.

    On 12/03/2022 19:14, Chris Palmer wrote:
    > Hi Miland (or anyone else who can help...)
    >
    > Reading this thread made me realise I had overlooked cephfs
    scrubbing,
    > so i tried it on a small 16.2.7 cluster. The normal forward scrub
    > showed nothing. However "ceph tell mds.0 scrub start ~mdsdir
    > recursive" did find one backtrace error (putting the cluster into
    > HEALTH_ERR). I then did a repair which according to the log did
    > rewrite the inode, and subsequent scrubs have not found it.
    >
    > However the cluster health is still ERR, and the MDS still shows
    the
    > damage:
    >
    > ceph@xxxx1:~$ ceph tell mds.0 damage ls
    > 2022-03-12T18:42:01.609+0000 7f1b817fa700  0 client.173985213
    > ms_handle_reset on v2:192.168.80.121:6824/939134894
    <http://192.168.80.121:6824/939134894>
    > 2022-03-12T18:42:01.625+0000 7f1b817fa700  0 client.173985219
    > ms_handle_reset on v2:192.168.80.121:6824/939134894
    <http://192.168.80.121:6824/939134894>
    > [
    >     {
    >         "damage_type": "backtrace",
    >         "id": 3308827822,
    >         "ino": 256,
    >         "path": "~mds0"
    >     }
    > ]
    >
    > What are the right steps from here? Has the error actually been
    > corrected but just needs clearing or is it still there?
    >
    > In case it is relevant: there is one active and two standby MDS.
    The
    > log is from the node currently hosting rank 0.
    >
    > From the mds log:
    >
    > 2022-03-12T18:13:41.593+0000 7f61d30c1700  1 mds.xxxx1
    asok_command:
    > scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive]}
    > (starting...)
    > 2022-03-12T18:13:41.593+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub queued for path: ~mds0
    > 2022-03-12T18:13:41.593+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub summary: idle+waiting paths [~mds0]
    > 2022-03-12T18:13:41.593+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub summary: active paths [~mds0]
    > 2022-03-12T18:13:41.601+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [WRN] : Scrub error on inode 0x100 (~mds0) see mds.xxxx1 log and
    > `damage ls` output for details
    > 2022-03-12T18:13:41.601+0000 7f61cb0b1700 -1 mds.0.scrubstack
    > _validate_inode_done scrub error on inode [inode 0x100 [...2,head]
    > ~mds0/ auth v6798 ap=1 snaprealm=0x55d59548
    > 4800 f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000
    > b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000
    > 11=0+11) (inest lock) (iversion lock) | dirtysca
    > ttered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=1
    > scrubqueue=0 0x55d595486000]:
    >
    {"performed_validation":true,"passed_validation":false,"backtrace":{"checked"
    >
    :true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed

    > to read off disk; see retval"},"raw_stats":{"ch
    >
    ecked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0

    > 10=0+10)","ondisk_value.rstat":"n(v0
    rc2022-03-12T16:01:44.218294+0000
    > b1017620718 375=364+11)","mem
    > ory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815
    > rc2022-03-12T16:01:44.218294+0000 b1017620718
    > 375=364+11)","error_str":""},"return_code":-61}
    > 2022-03-12T18:13:41.601+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub summary: idle+waiting paths [~mds0]
    > 2022-03-12T18:13:45.317+0000 7f61cf8ba700  0
    log_channel(cluster) log
    > [INF] : scrub summary: idle
    >
    > 2022-03-12T18:13:52.881+0000 7f61d30c1700  1 mds.xxxx1
    asok_command:
    > scrub start {path=~mdsdir,prefix=scrub
    > start,scrubops=[recursive,repair]} (starting...)
    > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub queued for path: ~mds0
    > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub summary: idle+waiting paths [~mds0]
    > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub summary: active paths [~mds0]
    > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [WRN] : bad backtrace on inode 0x100(~mds0), rewriting it
    > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : Scrub repaired inode 0x100 (~mds0)
    > 2022-03-12T18:13:52.881+0000 7f61cb0b1700 -1 mds.0.scrubstack
    > _validate_inode_done scrub error on inode [inode 0x100 [...2,head]
    > ~mds0/ auth v6798 ap=1 snaprealm=0x55d595484800 DIRTYPARENT f(v0
    > 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718
    > 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest
    > lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1
    > openingsnapparents=0 dirtyparent=1 dirty=1 authpin=1 scrubqueue=0
    > 0x55d595486000]:
    >
    {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed

    > to read off disk; see
    >
    retval"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0

    > 10=0+10)","ondisk_value.rstat":"n(v0
    rc2022-03-12T16:01:44.218294+0000
    > b1017620718 375=364+11)","memory_value.dirstat":"f(v0
    > 10=0+10)","memory_value.rstat":"n(v1815
    > rc2022-03-12T16:01:44.218294+0000 b1017620718
    > 375=364+11)","error_str":""},"return_code":-61}
    > 2022-03-12T18:13:52.881+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub summary: idle+waiting paths [~mds0]
    > 2022-03-12T18:13:55.317+0000 7f61cf8ba700  0
    log_channel(cluster) log
    > [INF] : scrub summary: idle
    >
    > 2022-03-12T18:14:12.608+0000 7f61d30c1700  1 mds.xxxx1
    asok_command:
    > scrub start {path=~mdsdir,prefix=scrub
    > start,scrubops=[recursive,repair]} (starting...)
    > 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub queued for path: ~mds0
    > 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub summary: idle+waiting paths [~mds0]
    > 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub summary: active paths [~mds0]
    > 2022-03-12T18:14:12.608+0000 7f61cb0b1700  0
    log_channel(cluster) log
    > [INF] : scrub summary: idle+waiting paths [~mds0]
    > 2022-03-12T18:14:15.316+0000 7f61cf8ba700  0
    log_channel(cluster) log
    > [INF] : scrub summary: idle
    >
    >
    > Thanks, Chris
    >
    >
    > On 11/03/2022 12:24, Milind Changire wrote:
    >> Here's some answers to your questions:
    >>
    >> On Sun, Mar 6, 2022 at 3:57 AM Arnaud
    M<arnaud.meauzoone@xxxxxxxxx>
    >> wrote:
    >>
    >>> Hello to everyone :)
    >>>
    >>> Just some question about filesystem scrubbing
    >>>
    >>> In this documentation it is said that scrub will help admin check
    >>> consistency of filesystem:
    >>>
    >>> https://docs.ceph.com/en/latest/cephfs/scrub/
    >>>
    >>> So my questions are:
    >>>
    >>> Is filesystem scrubbing mandatory ?
    >>> How often should I scrub the whole filesystem (ie start at /)
    >>> How often should I scrub ~mdsdir
    >>> Should I set up a cronjob ?
    >>> Is filesystem scrubbing considerated armless ? Even with
    recursive
    >>> force
    >>> repair ?
    >>> Is there any chance for scrubbing to overload mds on a big file
    >>> system (ie
    >>> like find . -ls) ?
    >>> What is the difference between "recursive repair" and
    "recursive force
    >>> repair" ? Is "force" armless ?
    >>> Is there any way to see at which file/folder is the scrub
    operation
    >>> ? In
    >>> fact any better way to see srub progress than "scrub status"
    which
    >>> doesn't
    >>> say much
    >>>
    >>> Sorry for all the questions, but there is not that much
    >>> documentation about
    >>> filesystem scrubbing. And I do think the answers will help a
    lot of
    >>> cephfs
    >>> administrators :)
    >>>
    >>> Thanks to all
    >>>
    >>> All the best
    >>>
    >>> Arnaud
    >>> _______________________________________________
    >>> ceph-users mailing list --ceph-users@xxxxxxx
    >>> To unsubscribe send an email toceph-users-leave@xxxxxxx
    >>>
    >>>
    >>     1.
    >>
    >>     Is filesystem scrubbing mandatory ?
    >>     As a routine system administration practice, it is good to
    ensure
    >> that
    >>     your file-system is always in a good state. To avoid
    getting the
    >>     file-system into a bottleneck state during work hours, it's
    >> always a good
    >>     idea to reserve some time to run a recursive forward scrub and
    >> use the
    >>     in-built scrub automation to fix such issues. Although you can
    >> run the
    >>     scrub at any directory of your choice, it's always a good
    >> practice to start
    >>     the scrub at the file-system root once in a while.
    >>
    >> So file-system scrubbing is not mandatory but highly recommended.
    >>
    >> Filesystem scrubbing is designed to read CephFS’ metadata and
    detect
    >> inconsistencies or issues that are generated by bitrot or bugs,
    just as
    >> RADOS’ pg scrubbing is. In a perfect world without bugs or bit
    flips it
    >> would be unnecessary, but we don’t live in that world — so a
    scrub can
    >> detect small issues before they turn into big ones, and the
    mere act of
    >> reading data can keep it fresh and give storage devices a
    chance to
    >> correct
    >> any media errors while that’s still possible.
    >>
    >> We don’t have a specific recommended schedule and scrub takes up
    >> cluster IO
    >> and compute resources so its frequency should be tailored to your
    >> workload.
    >>
    >>
    >>     1.
    >>
    >>     How often should I scrub the whole filesystem (ie start at /)
    >>     Since you'd always want to have a consistent file-system, it
    >> would good
    >>     to run scrubbing:
    >>     1.
    >>
    >>        before taking a snapshot of the entire file-system OR
    >>        2.
    >>
    >>        before taking a backup of the entire file-system OR
    >>        3.
    >>
    >>        after significant metadata activity eg. after creating
    files,
    >>        renaming files, deleting files, changing file
    attributes, etc.
    >>
    >>
    >> There's no one-rule-fixes-all scenario. So, you'll need to follow a
    >> heuristic approach. The type of devices (HDD or SSD), the amount of
    >> activity wearing the device are the typical factors involved when
    >> deciding
    >> to scrub a file-system. If you have some window dedicated for
    backup
    >> activity, then you’d want to run a recursive forward scrub with
    >> repair on
    >> the entire file-system before it is snapshotted and used for
    backup.
    >> Although you can run a scrub along with active use of the
    >> file-system, it
    >> is always recommended that you run the scrub on a quiet
    file-system
    >> so that
    >> neither of the activities get in each other’s way. This also
    helps in
    >> completing the scrub task quicker.
    >>
    >>
    >>     1.
    >>
    >>     How often should I scrub ~mdsdir ?
    >>     ~mdsdir is used to collect deleted (stray) entries. So, the
    >> number of
    >>     file/dir unlinks in a typical workload should be used to
    come up
    >> with a
    >>     heuristic to scrub the file-system. This activity can be
    taken up
    >>     separately from scrubbing the file-system root.
    >>
    >>
    >>
    >>     1.
    >>
    >>     Should I set up a cron job ?
    >>
    >> Yes, you could.
    >>
    >>
    >>     1.
    >>
    >>     Is filesystem scrubbing considered harmless ? Even with
    recursive
    >> force
    >>     repair ?
    >>
    >> Yes, scrubbing even with repair is harmless.
    >>
    >> Scrubbing with repair does the following things:
    >>
    >>     1.
    >>
    >>     Repair backtrace
    >>     If on-disk and in-memory backtraces don't match, then the
    >> DIRTYPARENT
    >>     flag is set so that the journal logger thread picks the
    inode for
    >> writing
    >>     the backtrace to the disk.
    >>     2.
    >>
    >>     Repair inode
    >>     If on-disk and in-memory inode versions don't match, then the
    >> inode is
    >>     left untouched. Otherwise, if the inode is marked as
    "free", the
    >> inode
    >>     number is removed from active use.
    >>     3.
    >>
    >>     Repair recursive-stats
    >>     If on-disk and in-memory raw-stats don't match, then all the
    >> stats for
    >>     the leaves in the directory tree are marked dirty and a
    >> scatter-gather
    >>     operation is forced to coalesce raw-stats info.
    >>
    >>
    >>
    >>     1.
    >>
    >>     Is there any chance for scrubbing to overload mds on a big
    file
    >> system
    >>     ie. like find . -ls ?
    >>     Scrubbing on its own should not be able to overload an MDS,
    but
    >> it is an
    >>     additional load on top of whatever client activity the MDS is
    >> serving,
    >>     which could exceed the server’s capacity. To put it in short,
    >> yes, it might
    >>     overload the mds when done in sustained high I/O scenarios.
    >>     The mds config option mds_max_scrub_ops_in_progress, which
    >> defaults to
    >>     5, decides the number of scrubs running at any given time. So,
    >> there is a
    >>     small effort at throttling.
    >>
    >>
    >>
    >>     1.
    >>
    >>     What is the difference between "recursive repair" and
    "recursive
    >> force
    >>     repair" ? Is "force" harmless ?
    >>     If “force” argument is specified, then a dirfrag is
    scrubbed only if
    >>     1.
    >>
    >>        The dentry version is greater than last scrub version AND
    >>        2.
    >>
    >>        The dentry type is a DIR
    >>
    >> If “force” is not specified, then dirfrag scrubbing is skipped.
    You
    >> will be
    >> able to see an mds log saying that the scrubbing is skipped for
    the
    >> dentry.
    >>
    >> The rest of the scrubbing is done as described in Q5 above.
    >>
    >>
    >>     1.
    >>
    >>     Is there any way to see at which file/folder is the scrub
    >> operation ? In
    >>     fact any better way to see scrub progress than "scrub status"
    >> which doesn't
    >>     say much.
    >>     Currently there's no way to see which file/folder is being
    >> scrubbed. At
    >>     most we could log a line in the mds logs about it, but it
    could
    >> soon cause
    >>     logs to bloat if the number of entries are large.
    >>
    >>
    >>
    >>
    >>
    > _______________________________________________
    > ceph-users mailing list -- ceph-users@xxxxxxx
    > To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Milind

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx