Re: [PATCH 2/7] btrfs: reduce the log level for btrfs_dev_stat_inc_and_print()

Filipe Manana <fdmanana@xxxxxxxxxx> · Mon, 18 Mar 2024 19:54:08 +0000

On Thu, Mar 14, 2024 at 8:26 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>
>
>
> 在 2024/3/15 03:47, Filipe Manana 写道:
> > On Thu, Mar 14, 2024 at 9:54 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
> >>
> >> Currently when we increase the device statistics, it would always lead
> >> to an error message in the kernel log.
> >>
> >> I would argue this behavior is not ideal:
> >>
> >> - It would flood the dmesg and bury real important messages
> >>    One common scenario is scrub.
> >>    If scrub hit some errors, it would cause both scrub and
> >>    btrfs_dev_stat_inc_and_print() to print error messages.
> >>
> >>    And in that case, btrfs_dev_stat_inc_and_print() is completely
> >>    useless.
> >>
> >> - The results of btrfs_dev_stat_inc_and_print() is mostly for history
> >>    monitoring, doesn't has enough details
> >>
> >>    If we trigger the errors during regular read, such messages from
> >>    btrfs_dev_stat_inc_and_print() won't help us to locate the cause
> >>    either.
> >>
> >> The real usage for the btrfs device statistics is for some user space
> >> daemon to check if there is any new errors, acting like some checks on
> >> SMART, thus we don't really need/want those messages in dmesg.
> >>
> >> This patch would reduce the log level to debug (disabled by default) for
> >> btrfs_dev_stat_inc_and_print().
> >> For users really want to utilize btrfs devices statistics, they should
> >> go check "btrfs device stats" periodically, and we should focus the
> >> kernel error messages to more important things.
> >
> > Not sure if this is the right thing to do.
> >
> > In the scrub context it can be annoying for sure.
> > Other cases I'm not so sure about, because having error messages in
> > dmesg/syslog may help notice issues more quickly.
>
> For non-scrub cases, I'd argue we already have enough output:
>
> No matter if the error is fixed or not, every time a mirror got csum
> mismatch or other errors, we already have error message output:
>
>   Data: btrfs_print_data_csum_error()
>   Meta: btrfs_validate_extent_buffer()
>
> For repaired ones, we have extra output from bio layer for both metadata
> and data:
>   btrfs_repair_io_failure()
>
> So I'd say the dev_stat ones are already duplicated.

Ok, I suppose it's fine then.

Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx>

Thanks.

>
> Thanks,
> Qu
> >
> >>
> >> CC: stable@xxxxxxxxxxxxxxx
> >> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> >> ---
> >>   fs/btrfs/volumes.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >> index e49935a54da0..126145950ed3 100644
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -7828,7 +7828,7 @@ void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index)
> >>
> >>          if (!dev->dev_stats_valid)
> >>                  return;
> >> -       btrfs_err_rl_in_rcu(dev->fs_info,
> >> +       btrfs_debug_rl_in_rcu(dev->fs_info,
> >>                  "bdev %s errs: wr %u, rd %u, flush %u, corrupt %u, gen %u",
> >>                             btrfs_dev_name(dev),
> >>                             btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_WRITE_ERRS),
> >> --
> >> 2.44.0
> >>
> >>
> >