On Sun, 02 Oct 2022 10:23:07 +0200, Artem S. Tashkinov wrote: > > > > On 10/2/22 07:37, Takashi Iwai wrote: > > On Sat, 01 Oct 2022 12:30:22 +0200, > > Artem S. Tashkinov wrote: > >> - 2 - > >> > >> Here's another one which is outright puzzling: > >> > >> You run: dmesg -t --level=emerg,crit,err > >> > >> And you see some non-descript errors of some kernel subsystems seemingly > >> failing or being unhappy about your hardware. Errors are as cryptic as > >> humanly possible, you don't even know what part of kernel has produced them. > >> > >> OK, as a "power" user I download the kernel source, run `grep -R message > >> /tmp/linux-5.19` and there are _multiple_ different modules and places > >> which contain this message. > >> > >> I'm lost. Send this to LKML? Did that in the long past, no one cared, I > >> stopped. > >> > >> Here's what I'm getting with Linux 5.19.12: > >> > >> platform wdat_wdt: failed to claim resource 5: [mem > >> 0x00000000-0xffffffff7fffffff] > >> ACPI: watchdog: Device creation failed: -16 > >> ACPI BIOS Error (bug): Could not resolve symbol > >> [\_SB.PCI0.XHC.RHUB.TPLD], AE_NOT_FOUND (20220331/psargs-330) > >> ACPI Error: Aborting method \_SB.UBTC.CR01._PLD due to previous error > >> (AE_NOT_FOUND) (20220331/psparse-529) > >> platform MSFT0101:00: failed to claim resource 1: [mem > >> 0xfed40000-0xfed40fff] > >> acpi MSFT0101:00: platform device creation failed: -16 > >> lis3lv02d: unknown sensor type 0x0 > >> > >> Are they serious? Should they be reported or not? Is my laptop properly > >> working? I have no clue at all. > > > > That's a dilemma. The kernel can't know whether it's "properly" > > working, either -- that is, whether the lack of some functions matters > > for you or not. In your case above, it's about a watchdog, something > > related with USB, TPM, and acceleration sensor, all of which likely > > come from a buggy BIOS. Would you mind if those features are missing? > > Or even whether your device has a correct hardware implementation? > > Kernel doesn't know, hence it complains as an error. > > > > In many drivers, there are mechanisms to shut off superfluous error > > messages for known devices. So it's case-by-case solutions. > > > > Or you can completely hide those errors at boot by a boot option > > (e.g. loglevel=2). > > The problem is some of such messages are indeed indicative of certain > real issues which result in HW not working properly, including: > > 1) missing/incorrect firmware > 2) most importantly: not enabled power saving modes > 3) not enabled high performance modes > 4) not enabled devices > 5) not enabled devices' functions > 6) drivers conflicts (i.e. the wrong module gets loaded for the device) > 7) physically failing hardware > > I'm quite sure you don't really know what half of those messages > actually mean. Of course: not because those messages are hardly understandable but because those messages indicate only the cause, and the exact end result can't be always known from the kernel at that point. A lack of physical failing hardware? Not enabled devices? Who knows. There might be some alternative, even a user-space driver. > Speaking of 7. Various kernel subsystems/drivers deal with e.g. mass > storage which is known to fail quite often. There's not a single driver > in the kernel which is actually brave enough to spew something like this: > > "/dev/xxxx might be failing, please RMA or seek help online" > > instead you get a dmesg choke full of "unable to read sector XXX" or > something like that. Oh you suggest that we should put "please RMA or seek help online" to each printk of KERN_ERR level, if it saves the world? ;) IMO, what matters for users is whether the system works or not. It's not how the kernel message appears. A kernel message may help for diagnose, but the message itself is no solution; that is, the most importance of a kernel message is that it indicates a real error that can be diagnosed by developers. If the end effect is pretty sure, a message may be more chatty. OTOH, people are annoyed by such too verbosity, too. So it needs a sensible choice. > To return to the previous errors: it's impossible for the user to assess > their severity and that sucks. Right, that's why I wrote it's a dilemma. > What is "platform device creation > failed"? What is "unknown sensor type"? What am I missing? Who's > responsible? The kernel? My HW vendor? Are those errors actionable? All those depend on the driver implementation and the hardware implementation. There is no general answer at all, unfortunately. > In > my understanding a properly working computer must not produce > "emerg,crit,err" errors. I'm not even talking about "warn,info" and such. Yes, some errors can be downgraded to warn or even to info. I myself find ACPI is way too chatty, too. So I believe something we can improve is to define some more clear guideline for KERN_ERR level errors. Takashi