Chris,
After you run "scrub repair" followed by a "scrub" without any issues,
and if the "damage ls" still shows you an error, try running "damage
rm" and re-run "scrub" to see if the system still reports a damage.
Please update the upstream tracker with your findings if possible.
--
Milind
On Sun, Mar 13, 2022 at 2:41 AM Chris Palmer <chris.palmer@xxxxxxxxx>
wrote:
Ok, restarting mds.0 cleared it. I then restarted the others until
this
one was again active, and repeated the scrub ~mdsdir which was
then clean.
I don't know what caused it, or why restarting the MDS was
necessary but
it has done the trick.
On 12/03/2022 19:14, Chris Palmer wrote:
> Hi Miland (or anyone else who can help...)
>
> Reading this thread made me realise I had overlooked cephfs
scrubbing,
> so i tried it on a small 16.2.7 cluster. The normal forward scrub
> showed nothing. However "ceph tell mds.0 scrub start ~mdsdir
> recursive" did find one backtrace error (putting the cluster into
> HEALTH_ERR). I then did a repair which according to the log did
> rewrite the inode, and subsequent scrubs have not found it.
>
> However the cluster health is still ERR, and the MDS still shows
the
> damage:
>
> ceph@xxxx1:~$ ceph tell mds.0 damage ls
> 2022-03-12T18:42:01.609+0000 7f1b817fa700 0 client.173985213
> ms_handle_reset on v2:192.168.80.121:6824/939134894
<http://192.168.80.121:6824/939134894>
> 2022-03-12T18:42:01.625+0000 7f1b817fa700 0 client.173985219
> ms_handle_reset on v2:192.168.80.121:6824/939134894
<http://192.168.80.121:6824/939134894>
> [
> {
> "damage_type": "backtrace",
> "id": 3308827822,
> "ino": 256,
> "path": "~mds0"
> }
> ]
>
> What are the right steps from here? Has the error actually been
> corrected but just needs clearing or is it still there?
>
> In case it is relevant: there is one active and two standby MDS.
The
> log is from the node currently hosting rank 0.
>
> From the mds log:
>
> 2022-03-12T18:13:41.593+0000 7f61d30c1700 1 mds.xxxx1
asok_command:
> scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive]}
> (starting...)
> 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub queued for path: ~mds0
> 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub summary: active paths [~mds0]
> 2022-03-12T18:13:41.601+0000 7f61cb0b1700 0
log_channel(cluster) log
> [WRN] : Scrub error on inode 0x100 (~mds0) see mds.xxxx1 log and
> `damage ls` output for details
> 2022-03-12T18:13:41.601+0000 7f61cb0b1700 -1 mds.0.scrubstack
> _validate_inode_done scrub error on inode [inode 0x100 [...2,head]
> ~mds0/ auth v6798 ap=1 snaprealm=0x55d59548
> 4800 f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000
> b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000
> 11=0+11) (inest lock) (iversion lock) | dirtysca
> ttered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=1
> scrubqueue=0 0x55d595486000]:
>
{"performed_validation":true,"passed_validation":false,"backtrace":{"checked"
>
:true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed
> to read off disk; see retval"},"raw_stats":{"ch
>
ecked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0
> 10=0+10)","ondisk_value.rstat":"n(v0
rc2022-03-12T16:01:44.218294+0000
> b1017620718 375=364+11)","mem
> ory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815
> rc2022-03-12T16:01:44.218294+0000 b1017620718
> 375=364+11)","error_str":""},"return_code":-61}
> 2022-03-12T18:13:41.601+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:13:45.317+0000 7f61cf8ba700 0
log_channel(cluster) log
> [INF] : scrub summary: idle
>
> 2022-03-12T18:13:52.881+0000 7f61d30c1700 1 mds.xxxx1
asok_command:
> scrub start {path=~mdsdir,prefix=scrub
> start,scrubops=[recursive,repair]} (starting...)
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub queued for path: ~mds0
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub summary: active paths [~mds0]
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0
log_channel(cluster) log
> [WRN] : bad backtrace on inode 0x100(~mds0), rewriting it
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : Scrub repaired inode 0x100 (~mds0)
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700 -1 mds.0.scrubstack
> _validate_inode_done scrub error on inode [inode 0x100 [...2,head]
> ~mds0/ auth v6798 ap=1 snaprealm=0x55d595484800 DIRTYPARENT f(v0
> 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718
> 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest
> lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1
> openingsnapparents=0 dirtyparent=1 dirty=1 authpin=1 scrubqueue=0
> 0x55d595486000]:
>
{"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed
> to read off disk; see
>
retval"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0
> 10=0+10)","ondisk_value.rstat":"n(v0
rc2022-03-12T16:01:44.218294+0000
> b1017620718 375=364+11)","memory_value.dirstat":"f(v0
> 10=0+10)","memory_value.rstat":"n(v1815
> rc2022-03-12T16:01:44.218294+0000 b1017620718
> 375=364+11)","error_str":""},"return_code":-61}
> 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:13:55.317+0000 7f61cf8ba700 0
log_channel(cluster) log
> [INF] : scrub summary: idle
>
> 2022-03-12T18:14:12.608+0000 7f61d30c1700 1 mds.xxxx1
asok_command:
> scrub start {path=~mdsdir,prefix=scrub
> start,scrubops=[recursive,repair]} (starting...)
> 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub queued for path: ~mds0
> 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub summary: active paths [~mds0]
> 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0
log_channel(cluster) log
> [INF] : scrub summary: idle+waiting paths [~mds0]
> 2022-03-12T18:14:15.316+0000 7f61cf8ba700 0
log_channel(cluster) log
> [INF] : scrub summary: idle
>
>
> Thanks, Chris
>
>
> On 11/03/2022 12:24, Milind Changire wrote:
>> Here's some answers to your questions:
>>
>> On Sun, Mar 6, 2022 at 3:57 AM Arnaud
M<arnaud.meauzoone@xxxxxxxxx>
>> wrote:
>>
>>> Hello to everyone :)
>>>
>>> Just some question about filesystem scrubbing
>>>
>>> In this documentation it is said that scrub will help admin check
>>> consistency of filesystem:
>>>
>>> https://docs.ceph.com/en/latest/cephfs/scrub/
>>>
>>> So my questions are:
>>>
>>> Is filesystem scrubbing mandatory ?
>>> How often should I scrub the whole filesystem (ie start at /)
>>> How often should I scrub ~mdsdir
>>> Should I set up a cronjob ?
>>> Is filesystem scrubbing considerated armless ? Even with
recursive
>>> force
>>> repair ?
>>> Is there any chance for scrubbing to overload mds on a big file
>>> system (ie
>>> like find . -ls) ?
>>> What is the difference between "recursive repair" and
"recursive force
>>> repair" ? Is "force" armless ?
>>> Is there any way to see at which file/folder is the scrub
operation
>>> ? In
>>> fact any better way to see srub progress than "scrub status"
which
>>> doesn't
>>> say much
>>>
>>> Sorry for all the questions, but there is not that much
>>> documentation about
>>> filesystem scrubbing. And I do think the answers will help a
lot of
>>> cephfs
>>> administrators :)
>>>
>>> Thanks to all
>>>
>>> All the best
>>>
>>> Arnaud
>>> _______________________________________________
>>> ceph-users mailing list --ceph-users@xxxxxxx
>>> To unsubscribe send an email toceph-users-leave@xxxxxxx
>>>
>>>
>> 1.
>>
>> Is filesystem scrubbing mandatory ?
>> As a routine system administration practice, it is good to
ensure
>> that
>> your file-system is always in a good state. To avoid
getting the
>> file-system into a bottleneck state during work hours, it's
>> always a good
>> idea to reserve some time to run a recursive forward scrub and
>> use the
>> in-built scrub automation to fix such issues. Although you can
>> run the
>> scrub at any directory of your choice, it's always a good
>> practice to start
>> the scrub at the file-system root once in a while.
>>
>> So file-system scrubbing is not mandatory but highly recommended.
>>
>> Filesystem scrubbing is designed to read CephFS’ metadata and
detect
>> inconsistencies or issues that are generated by bitrot or bugs,
just as
>> RADOS’ pg scrubbing is. In a perfect world without bugs or bit
flips it
>> would be unnecessary, but we don’t live in that world — so a
scrub can
>> detect small issues before they turn into big ones, and the
mere act of
>> reading data can keep it fresh and give storage devices a
chance to
>> correct
>> any media errors while that’s still possible.
>>
>> We don’t have a specific recommended schedule and scrub takes up
>> cluster IO
>> and compute resources so its frequency should be tailored to your
>> workload.
>>
>>
>> 1.
>>
>> How often should I scrub the whole filesystem (ie start at /)
>> Since you'd always want to have a consistent file-system, it
>> would good
>> to run scrubbing:
>> 1.
>>
>> before taking a snapshot of the entire file-system OR
>> 2.
>>
>> before taking a backup of the entire file-system OR
>> 3.
>>
>> after significant metadata activity eg. after creating
files,
>> renaming files, deleting files, changing file
attributes, etc.
>>
>>
>> There's no one-rule-fixes-all scenario. So, you'll need to follow a
>> heuristic approach. The type of devices (HDD or SSD), the amount of
>> activity wearing the device are the typical factors involved when
>> deciding
>> to scrub a file-system. If you have some window dedicated for
backup
>> activity, then you’d want to run a recursive forward scrub with
>> repair on
>> the entire file-system before it is snapshotted and used for
backup.
>> Although you can run a scrub along with active use of the
>> file-system, it
>> is always recommended that you run the scrub on a quiet
file-system
>> so that
>> neither of the activities get in each other’s way. This also
helps in
>> completing the scrub task quicker.
>>
>>
>> 1.
>>
>> How often should I scrub ~mdsdir ?
>> ~mdsdir is used to collect deleted (stray) entries. So, the
>> number of
>> file/dir unlinks in a typical workload should be used to
come up
>> with a
>> heuristic to scrub the file-system. This activity can be
taken up
>> separately from scrubbing the file-system root.
>>
>>
>>
>> 1.
>>
>> Should I set up a cron job ?
>>
>> Yes, you could.
>>
>>
>> 1.
>>
>> Is filesystem scrubbing considered harmless ? Even with
recursive
>> force
>> repair ?
>>
>> Yes, scrubbing even with repair is harmless.
>>
>> Scrubbing with repair does the following things:
>>
>> 1.
>>
>> Repair backtrace
>> If on-disk and in-memory backtraces don't match, then the
>> DIRTYPARENT
>> flag is set so that the journal logger thread picks the
inode for
>> writing
>> the backtrace to the disk.
>> 2.
>>
>> Repair inode
>> If on-disk and in-memory inode versions don't match, then the
>> inode is
>> left untouched. Otherwise, if the inode is marked as
"free", the
>> inode
>> number is removed from active use.
>> 3.
>>
>> Repair recursive-stats
>> If on-disk and in-memory raw-stats don't match, then all the
>> stats for
>> the leaves in the directory tree are marked dirty and a
>> scatter-gather
>> operation is forced to coalesce raw-stats info.
>>
>>
>>
>> 1.
>>
>> Is there any chance for scrubbing to overload mds on a big
file
>> system
>> ie. like find . -ls ?
>> Scrubbing on its own should not be able to overload an MDS,
but
>> it is an
>> additional load on top of whatever client activity the MDS is
>> serving,
>> which could exceed the server’s capacity. To put it in short,
>> yes, it might
>> overload the mds when done in sustained high I/O scenarios.
>> The mds config option mds_max_scrub_ops_in_progress, which
>> defaults to
>> 5, decides the number of scrubs running at any given time. So,
>> there is a
>> small effort at throttling.
>>
>>
>>
>> 1.
>>
>> What is the difference between "recursive repair" and
"recursive
>> force
>> repair" ? Is "force" harmless ?
>> If “force” argument is specified, then a dirfrag is
scrubbed only if
>> 1.
>>
>> The dentry version is greater than last scrub version AND
>> 2.
>>
>> The dentry type is a DIR
>>
>> If “force” is not specified, then dirfrag scrubbing is skipped.
You
>> will be
>> able to see an mds log saying that the scrubbing is skipped for
the
>> dentry.
>>
>> The rest of the scrubbing is done as described in Q5 above.
>>
>>
>> 1.
>>
>> Is there any way to see at which file/folder is the scrub
>> operation ? In
>> fact any better way to see scrub progress than "scrub status"
>> which doesn't
>> say much.
>> Currently there's no way to see which file/folder is being
>> scrubbed. At
>> most we could log a line in the mds logs about it, but it
could
>> soon cause
>> logs to bloat if the number of entries are large.
>>
>>
>>
>>
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Milind