Re: [PATCH v3 2/2] usb: typec: ucsi: Implement ChromeOS UCSI driver

Guenter Roeck <groeck@xxxxxxxxxx> · Mon, 8 Apr 2024 06:04:22 -0700

On Thu, Apr 4, 2024 at 6:30 AM Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
[ ... ]

> > > > if (WARN_ON_ONCE(val_len > MAX_EC_DATA_SIZE))
> > > >   return -EINVAL;
> > >
> > > So if you trigger this, you just rebooted all boxes that have
> > > panic-on-warn enabled (hint, the HUGE majority in quantity of Linux
> > > systems out there.)
> > >
> > > So don't do that, just handle it like this.
> >
> > Does that mean that we should not use WARN at all? What is the best
> > current practice for WARN usage?
>
> To never use it.  Handle the issue and recover properly.
>
> > I'm asking because for me this looks like a perfect usecase. If I were
> > at the positiion of the driver developer, I'd like to know the whole
> > path leading to the bad call, not just the fact that the function was
> > called with the buffer being too big.
>
> Then use ftrace if you are a driver developer, don't crash users boxes
> please.
>
> If you REALLY need a traceback, then provide that, but do NOT use WARN()
> for just normal debugging calls that you want to leave around in the
> system for users to trip over.
>

That is not common practice.

$ git grep WARN_ON drivers/gpu | wc
   3004   11999  246545
$ git grep WARN_ON drivers/net/ | wc
   3679   14564  308230
$ git grep WARN_ON drivers/net/wireless | wc
   1985    8112  166081

We get hundreds of thousands of reports with warning backtraces from
Chromebooks in the field _every single day_. Most of those are from
drm and wireless subsystems. We even had to scale back the percentage
of reported warning backtraces because the large volume overwhelmed
the reporting system. When approached about it, developers usually
respond with "this backtrace is absolutely necessary", but nothing
ever happens to fix the reported problems. In practice, they are just
ignored.

This means that any system using drm or wireless interfaces just can
not really enable panic-on-warn because that would crash the system
all the time.

Guenter