I don't know what caused it, or why restarting the MDS was necessary but it has done the trick.
On 12/03/2022 19:14, Chris Palmer wrote:
Hi Miland (or anyone else who can help...)Reading this thread made me realise I had overlooked cephfs scrubbing, so i tried it on a small 16.2.7 cluster. The normal forward scrub showed nothing. However "ceph tell mds.0 scrub start ~mdsdir recursive" did find one backtrace error (putting the cluster into HEALTH_ERR). I then did a repair which according to the log did rewrite the inode, and subsequent scrubs have not found it.However the cluster health is still ERR, and the MDS still shows the damage:ceph@xxxx1:~$ ceph tell mds.0 damage ls2022-03-12T18:42:01.609+0000 7f1b817fa700 0 client.173985213 ms_handle_reset on v2:192.168.80.121:6824/939134894 2022-03-12T18:42:01.625+0000 7f1b817fa700 0 client.173985219 ms_handle_reset on v2:192.168.80.121:6824/939134894[ { "damage_type": "backtrace", "id": 3308827822, "ino": 256, "path": "~mds0" } ]What are the right steps from here? Has the error actually been corrected but just needs clearing or is it still there?In case it is relevant: there is one active and two standby MDS. The log is from the node currently hosting rank 0.From the mds log:2022-03-12T18:13:41.593+0000 7f61d30c1700 1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive]} (starting...) 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] 2022-03-12T18:13:41.593+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0] 2022-03-12T18:13:41.601+0000 7f61cb0b1700 0 log_channel(cluster) log [WRN] : Scrub error on inode 0x100 (~mds0) see mds.xxxx1 log and `damage ls` output for details 2022-03-12T18:13:41.601+0000 7f61cb0b1700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x100 [...2,head] ~mds0/ auth v6798 ap=1 snaprealm=0x55d59548 4800 f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest lock) (iversion lock) | dirtysca ttered=0 lock=0 dirfrag=1 openingsnapparents=0 dirty=1 authpin=1 scrubqueue=0 0x55d595486000]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked" :true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed to read off disk; see retval"},"raw_stats":{"ch ecked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","mem ory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","error_str":""},"return_code":-61} 2022-03-12T18:13:41.601+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] 2022-03-12T18:13:45.317+0000 7f61cf8ba700 0 log_channel(cluster) log [INF] : scrub summary: idle2022-03-12T18:13:52.881+0000 7f61d30c1700 1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive,repair]} (starting...) 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0] 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [WRN] : bad backtrace on inode 0x100(~mds0), rewriting it 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : Scrub repaired inode 0x100 (~mds0) 2022-03-12T18:13:52.881+0000 7f61cb0b1700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x100 [...2,head] ~mds0/ auth v6798 ap=1 snaprealm=0x55d595484800 DIRTYPARENT f(v0 10=0+10) n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)/n(v0 rc2019-10-29T10:52:34.302967+0000 11=0+11) (inest lock) (iversion lock) | dirtyscattered=0 lock=0 dirfrag=1 openingsnapparents=0 dirtyparent=1 dirty=1 authpin=1 scrubqueue=0 0x55d595486000]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//[]","memoryvalue":"(11)0x100:[]//[]","error_str":"failed to read off disk; see retval"},"raw_stats":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 10=0+10)","ondisk_value.rstat":"n(v0 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","memory_value.dirstat":"f(v0 10=0+10)","memory_value.rstat":"n(v1815 rc2022-03-12T16:01:44.218294+0000 b1017620718 375=364+11)","error_str":""},"return_code":-61} 2022-03-12T18:13:52.881+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] 2022-03-12T18:13:55.317+0000 7f61cf8ba700 0 log_channel(cluster) log [INF] : scrub summary: idle2022-03-12T18:14:12.608+0000 7f61d30c1700 1 mds.xxxx1 asok_command: scrub start {path=~mdsdir,prefix=scrub start,scrubops=[recursive,repair]} (starting...) 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub queued for path: ~mds0 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: active paths [~mds0] 2022-03-12T18:14:12.608+0000 7f61cb0b1700 0 log_channel(cluster) log [INF] : scrub summary: idle+waiting paths [~mds0] 2022-03-12T18:14:15.316+0000 7f61cf8ba700 0 log_channel(cluster) log [INF] : scrub summary: idleThanks, Chris On 11/03/2022 12:24, Milind Changire wrote:Here's some answers to your questions:On Sun, Mar 6, 2022 at 3:57 AM Arnaud M<arnaud.meauzoone@xxxxxxxxx> wrote:Hello to everyone :) Just some question about filesystem scrubbing In this documentation it is said that scrub will help admin check consistency of filesystem: https://docs.ceph.com/en/latest/cephfs/scrub/ So my questions are: Is filesystem scrubbing mandatory ? How often should I scrub the whole filesystem (ie start at /) How often should I scrub ~mdsdir Should I set up a cronjob ?Is filesystem scrubbing considerated armless ? Even with recursive forcerepair ?Is there any chance for scrubbing to overload mds on a big file system (ielike find . -ls) ? What is the difference between "recursive repair" and "recursive force repair" ? Is "force" armless ?Is there any way to see at which file/folder is the scrub operation ? In fact any better way to see srub progress than "scrub status" which doesn'tsay muchSorry for all the questions, but there is not that much documentation about filesystem scrubbing. And I do think the answers will help a lot of cephfsadministrators :) Thanks to all All the best Arnaud _______________________________________________ ceph-users mailing list --ceph-users@xxxxxxx To unsubscribe send an email toceph-users-leave@xxxxxxx1. Is filesystem scrubbing mandatory ?As a routine system administration practice, it is good to ensure thatyour file-system is always in a good state. To avoid getting thefile-system into a bottleneck state during work hours, it's always a good idea to reserve some time to run a recursive forward scrub and use the in-built scrub automation to fix such issues. Although you can run the scrub at any directory of your choice, it's always a good practice to startthe scrub at the file-system root once in a while. So file-system scrubbing is not mandatory but highly recommended. Filesystem scrubbing is designed to read CephFS’ metadata and detect inconsistencies or issues that are generated by bitrot or bugs, just as RADOS’ pg scrubbing is. In a perfect world without bugs or bit flips it would be unnecessary, but we don’t live in that world — so a scrub can detect small issues before they turn into big ones, and the mere act ofreading data can keep it fresh and give storage devices a chance to correctany media errors while that’s still possible.We don’t have a specific recommended schedule and scrub takes up cluster IO and compute resources so its frequency should be tailored to your workload.1. How often should I scrub the whole filesystem (ie start at /)Since you'd always want to have a consistent file-system, it would goodto run scrubbing: 1. before taking a snapshot of the entire file-system OR 2. before taking a backup of the entire file-system OR 3. after significant metadata activity eg. after creating files, renaming files, deleting files, changing file attributes, etc. There's no one-rule-fixes-all scenario. So, you'll need to follow a heuristic approach. The type of devices (HDD or SSD), the amount ofactivity wearing the device are the typical factors involved when decidingto scrub a file-system. If you have some window dedicated for backupactivity, then you’d want to run a recursive forward scrub with repair onthe entire file-system before it is snapshotted and used for backup.Although you can run a scrub along with active use of the file-system, it is always recommended that you run the scrub on a quiet file-system so thatneither of the activities get in each other’s way. This also helps in completing the scrub task quicker. 1. How often should I scrub ~mdsdir ?~mdsdir is used to collect deleted (stray) entries. So, the number of file/dir unlinks in a typical workload should be used to come up with aheuristic to scrub the file-system. This activity can be taken up separately from scrubbing the file-system root. 1. Should I set up a cron job ? Yes, you could. 1.Is filesystem scrubbing considered harmless ? Even with recursive forcerepair ? Yes, scrubbing even with repair is harmless. Scrubbing with repair does the following things: 1. Repair backtraceIf on-disk and in-memory backtraces don't match, then the DIRTYPARENT flag is set so that the journal logger thread picks the inode for writingthe backtrace to the disk. 2. Repair inodeIf on-disk and in-memory inode versions don't match, then the inode is left untouched. Otherwise, if the inode is marked as "free", the inodenumber is removed from active use. 3. Repair recursive-statsIf on-disk and in-memory raw-stats don't match, then all the stats for the leaves in the directory tree are marked dirty and a scatter-gatheroperation is forced to coalesce raw-stats info. 1.Is there any chance for scrubbing to overload mds on a big file systemie. like find . -ls ?Scrubbing on its own should not be able to overload an MDS, but it is an additional load on top of whatever client activity the MDS is serving, which could exceed the server’s capacity. To put it in short, yes, it mightoverload the mds when done in sustained high I/O scenarios.The mds config option mds_max_scrub_ops_in_progress, which defaults to 5, decides the number of scrubs running at any given time. So, there is asmall effort at throttling. 1.What is the difference between "recursive repair" and "recursive forcerepair" ? Is "force" harmless ? If “force” argument is specified, then a dirfrag is scrubbed only if 1. The dentry version is greater than last scrub version AND 2. The dentry type is a DIRIf “force” is not specified, then dirfrag scrubbing is skipped. You will be able to see an mds log saying that the scrubbing is skipped for the dentry.The rest of the scrubbing is done as described in Q5 above. 1.Is there any way to see at which file/folder is the scrub operation ? In fact any better way to see scrub progress than "scrub status" which doesn'tsay much.Currently there's no way to see which file/folder is being scrubbed. At most we could log a line in the mds logs about it, but it could soon causelogs to bloat if the number of entries are large._______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx