On Mon, Apr 08, 2024 at 06:04:22AM -0700, Guenter Roeck wrote: > On Thu, Apr 4, 2024 at 6:30 AM Greg Kroah-Hartman > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > [ ... ] > > > > > > if (WARN_ON_ONCE(val_len > MAX_EC_DATA_SIZE)) > > > > > return -EINVAL; > > > > > > > > So if you trigger this, you just rebooted all boxes that have > > > > panic-on-warn enabled (hint, the HUGE majority in quantity of Linux > > > > systems out there.) > > > > > > > > So don't do that, just handle it like this. > > > > > > Does that mean that we should not use WARN at all? What is the best > > > current practice for WARN usage? > > > > To never use it. Handle the issue and recover properly. > > > > > I'm asking because for me this looks like a perfect usecase. If I were > > > at the positiion of the driver developer, I'd like to know the whole > > > path leading to the bad call, not just the fact that the function was > > > called with the buffer being too big. > > > > Then use ftrace if you are a driver developer, don't crash users boxes > > please. > > > > If you REALLY need a traceback, then provide that, but do NOT use WARN() > > for just normal debugging calls that you want to leave around in the > > system for users to trip over. > > > > That is not common practice. > > $ git grep WARN_ON drivers/gpu | wc > 3004 11999 246545 > $ git grep WARN_ON drivers/net/ | wc > 3679 14564 308230 > $ git grep WARN_ON drivers/net/wireless | wc > 1985 8112 166081 > > We get hundreds of thousands of reports with warning backtraces from > Chromebooks in the field _every single day_. Most of those are from > drm and wireless subsystems. We even had to scale back the percentage > of reported warning backtraces because the large volume overwhelmed > the reporting system. When approached about it, developers usually > respond with "this backtrace is absolutely necessary", but nothing > ever happens to fix the reported problems. In practice, they are just > ignored. That's sad. > > This means that any system using drm or wireless interfaces just can > not really enable panic-on-warn because that would crash the system > all the time. And this is good from my point of view. If I remember correctly, initially panic-on-warn was added to simplify debugging of the warnings rather than to disallow using WARN_ON(). The system is not supposed to continue running after BUG(), so panic/reset on BUG is a safe approach. But the WARN is different. It means that the system was able to cope with it. And as such there is no need to panic. Whoever enabled panic-on-warn is doing a strange thing from my POV. -- With best wishes Dmitry