Re: [PATCH v3 2/2] usb: typec: ucsi: Implement ChromeOS UCSI driver

Dmitry Baryshkov <dmitry.baryshkov@xxxxxxxxxx> · Mon, 8 Apr 2024 20:12:16 +0300

On Mon, Apr 08, 2024 at 06:04:22AM -0700, Guenter Roeck wrote:
> On Thu, Apr 4, 2024 at 6:30 AM Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> [ ... ]
> 
> > > > > if (WARN_ON_ONCE(val_len > MAX_EC_DATA_SIZE))
> > > > >   return -EINVAL;
> > > >
> > > > So if you trigger this, you just rebooted all boxes that have
> > > > panic-on-warn enabled (hint, the HUGE majority in quantity of Linux
> > > > systems out there.)
> > > >
> > > > So don't do that, just handle it like this.
> > >
> > > Does that mean that we should not use WARN at all? What is the best
> > > current practice for WARN usage?
> >
> > To never use it.  Handle the issue and recover properly.
> >
> > > I'm asking because for me this looks like a perfect usecase. If I were
> > > at the positiion of the driver developer, I'd like to know the whole
> > > path leading to the bad call, not just the fact that the function was
> > > called with the buffer being too big.
> >
> > Then use ftrace if you are a driver developer, don't crash users boxes
> > please.
> >
> > If you REALLY need a traceback, then provide that, but do NOT use WARN()
> > for just normal debugging calls that you want to leave around in the
> > system for users to trip over.
> >
> 
> That is not common practice.
> 
> $ git grep WARN_ON drivers/gpu | wc
>    3004   11999  246545
> $ git grep WARN_ON drivers/net/ | wc
>    3679   14564  308230
> $ git grep WARN_ON drivers/net/wireless | wc
>    1985    8112  166081
> 
> We get hundreds of thousands of reports with warning backtraces from
> Chromebooks in the field _every single day_. Most of those are from
> drm and wireless subsystems. We even had to scale back the percentage
> of reported warning backtraces because the large volume overwhelmed
> the reporting system. When approached about it, developers usually
> respond with "this backtrace is absolutely necessary", but nothing
> ever happens to fix the reported problems. In practice, they are just
> ignored.

That's sad.

> 
> This means that any system using drm or wireless interfaces just can
> not really enable panic-on-warn because that would crash the system
> all the time.

And this is good from my point of view. If I remember correctly,
initially panic-on-warn was added to simplify debugging of the warnings
rather than to disallow using WARN_ON(). The system is not supposed to
continue running after BUG(), so panic/reset on BUG is a safe approach.
But the WARN is different. It means that the system was able to cope
with it. And as such there is no need to panic. Whoever enabled
panic-on-warn is doing a strange thing from my POV.

-- 
With best wishes
Dmitry