On Thu, Mar 14, 2024 at 8:26 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: > > > > 在 2024/3/15 03:47, Filipe Manana 写道: > > On Thu, Mar 14, 2024 at 9:54 AM Qu Wenruo <wqu@xxxxxxxx> wrote: > >> > >> Currently when we increase the device statistics, it would always lead > >> to an error message in the kernel log. > >> > >> I would argue this behavior is not ideal: > >> > >> - It would flood the dmesg and bury real important messages > >> One common scenario is scrub. > >> If scrub hit some errors, it would cause both scrub and > >> btrfs_dev_stat_inc_and_print() to print error messages. > >> > >> And in that case, btrfs_dev_stat_inc_and_print() is completely > >> useless. > >> > >> - The results of btrfs_dev_stat_inc_and_print() is mostly for history > >> monitoring, doesn't has enough details > >> > >> If we trigger the errors during regular read, such messages from > >> btrfs_dev_stat_inc_and_print() won't help us to locate the cause > >> either. > >> > >> The real usage for the btrfs device statistics is for some user space > >> daemon to check if there is any new errors, acting like some checks on > >> SMART, thus we don't really need/want those messages in dmesg. > >> > >> This patch would reduce the log level to debug (disabled by default) for > >> btrfs_dev_stat_inc_and_print(). > >> For users really want to utilize btrfs devices statistics, they should > >> go check "btrfs device stats" periodically, and we should focus the > >> kernel error messages to more important things. > > > > Not sure if this is the right thing to do. > > > > In the scrub context it can be annoying for sure. > > Other cases I'm not so sure about, because having error messages in > > dmesg/syslog may help notice issues more quickly. > > For non-scrub cases, I'd argue we already have enough output: > > No matter if the error is fixed or not, every time a mirror got csum > mismatch or other errors, we already have error message output: > > Data: btrfs_print_data_csum_error() > Meta: btrfs_validate_extent_buffer() > > For repaired ones, we have extra output from bio layer for both metadata > and data: > btrfs_repair_io_failure() > > So I'd say the dev_stat ones are already duplicated. Ok, I suppose it's fine then. Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx> Thanks. > > Thanks, > Qu > > > >> > >> CC: stable@xxxxxxxxxxxxxxx > >> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > >> --- > >> fs/btrfs/volumes.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > >> index e49935a54da0..126145950ed3 100644 > >> --- a/fs/btrfs/volumes.c > >> +++ b/fs/btrfs/volumes.c > >> @@ -7828,7 +7828,7 @@ void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index) > >> > >> if (!dev->dev_stats_valid) > >> return; > >> - btrfs_err_rl_in_rcu(dev->fs_info, > >> + btrfs_debug_rl_in_rcu(dev->fs_info, > >> "bdev %s errs: wr %u, rd %u, flush %u, corrupt %u, gen %u", > >> btrfs_dev_name(dev), > >> btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_WRITE_ERRS), > >> -- > >> 2.44.0 > >> > >> > >