https://bugzilla.kernel.org/show_bug.cgi?id=204807 Matthew Garrett (mjg59-kernel@xxxxxxxxxxxxx) changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |INVALID --- Comment #37 from Matthew Garrett (mjg59-kernel@xxxxxxxxxxxxx) --- Here's the situation. Your ACPI tables declare that your system firmware may access the addresses associated with your IO sensors. We have no idea what your firmware may do here - it may do nothing (in which case accessing the addresses is completely safe), or it may use them for its own internal monitoring. Sensor hardware frequently uses indexed addressing, which means that accessing a sensor requires something like the following: 1) Write the desired sensor to the index register 2) Read the sensor value from the data register These can't occur simultaneously, so if both the OS and the firmware are accessing it you risk ending up with something like: 1) Write sensor A to the index register (from the OS) 2) Write sensor B to the index register (from the firmware) 3) Read the sensor value from the data register (returns the value of sensor B to the firmware) 4) Read the sensor value from the data register (returns the value of sensor B to the OS) The OS asked for the value of sensor A, but received the value of sensor B. >From the OS side this is probably not a big deal (you get a weird value in your graphing), but if it happens the other way around the firmware may decide that the system is running out of spec and shut it down to avoid damage. This is not a good user experience. Why does Windows not have the same problem? Well, in the general case there's nothing stopping it from doing so. Vendor tooling usually takes one of two approaches: 1) They don't use the hardware sensors directly, they use firmware interfaces to them. This is alluded to in comment #31 - on Asus systems, the sensors are available via a WMI interface. Using a firmware interface ensures that the firmware knows what the state of the hardware is, and avoids any race conditions. Your board may well support an alternative firmware interface and Linux simply lacks driver support for it. If so, I'm afraid that the correct solution is to add that driver support. Given that this bug has ended up covering boards from multiple vendors, it's no longer the correct place to handle that, though. 2) The vendor knows that the firmware makes no policy decisions based on the sensor values, so it's safe to access the resources even though the firmware declares that it uses them. The problem with this approach is that *we* have no way of knowing that it's safe, and the consequences of it being unsafe include data loss. Given the choice between users being able to look at system temperatures and users not losing data, we choose to prioritise users not losing data. Looking at your ACPI tables, we see the following: Name (IOHW, 0x0290) OperationRegion (SHWM, SystemIO, IOHW, 0x0A) Field (SHWM, ByteAcc, NoLock, Preserve) { Offset (0x05), HIDX, 8, HDAT, 8 } This means that there's a region of IO ports starting at address 0x290 and 0x0a addresses long. This is the same region of port IO that your sensor chip uses. Within that address range, we declare that 0x295 is called HIDX, and 0x296 is called HDAT. This is consistent with an index and data register as described above, which means that having the OS access this space directly is likely to race with the firmware (ie, it's dangerous). Near here are two methods called RHWM and WHWM. At a guess, that's "Read Hardware Monitoring" and "Write Hardware Monitoring". These not only access the sensors via the registers described above, they do some additional hardware access around it. This is further evidence to support there being some handshaking involved to avoid race conditions - the firmware takes a mutex and appears to hit some other register that may also be used to guard against racing against system management mode. We really, *really* want to be using the firmware methods here rather than touching the sensor chip directly. At this point, direct access isn't so much walking past a sign saying "Danger, keep out", it's a sign saying "Proceed no further or you will die slowly and it will hurt the entire time". RHWM is referenced from the WMBD method if the first argument to it is RHWM, and WHWM is referenced if the argument is WHWM. WMBD is the WMI dispatcher for the WMI function with identifier "BD" - looking at your _WDG object, which describes the available WMI interfaces, we have the following: Name (_WDG, Buffer (0x50) { /* 0000 */ 0xD0, 0x5E, 0x84, 0x97, 0x6D, 0x4E, 0xDE, 0x11, // .^..mN.. /* 0008 */ 0x8A, 0x39, 0x08, 0x00, 0x20, 0x0C, 0x9A, 0x66, // .9.. ..f /* 0010 */ 0x42, 0x43, 0x01, 0x02, 0xA0, 0x47, 0x67, 0x46, // BC...GgF /* 0018 */ 0xEC, 0x70, 0xDE, 0x11, 0x8A, 0x39, 0x08, 0x00, // .p...9.. /* 0020 */ 0x20, 0x0C, 0x9A, 0x66, 0x42, 0x44, 0x01, 0x02, // ..fBD.. /* 0028 */ 0x72, 0x0F, 0xBC, 0xAB, 0xA1, 0x8E, 0xD1, 0x11, // r....... /* 0030 */ 0x00, 0xA0, 0xC9, 0x06, 0x29, 0x10, 0x00, 0x00, // ....)... /* 0038 */ 0xD2, 0x00, 0x01, 0x08, 0x21, 0x12, 0x90, 0x05, // ....!... /* 0040 */ 0x66, 0xD5, 0xD1, 0x11, 0xB2, 0xF0, 0x00, 0xA0, // f....... /* 0048 */ 0xC9, 0x06, 0x29, 0x10, 0x4D, 0x4F, 0x01, 0x00 // ..).MO.. }) The format of _WDG is 16 bytes of GUID, 2 bytes of ID or notification data, 1 byte of instance count and 1 byte of flags. The GUID used by asus-wmi corresponds to the first GUID in this file, 97845ED0-4E6D-11DE-8A39-0800200C9A66. That has an ID of 0x4243, or BC - ie, it's not the GUID we're looking for. The next GUID, however, (466747a0-70ec-11de-8a39-0800200c9a66) has an identifier of 0x4344, or BD. So this is the GUID we're looking for. Unfortunately asus-wmi doesn't handle this GUID, so new code will need to be written. I'm going to close this bug again because it's turned into a generic bug covering different motherboard vendors, and there's no one size fits all solution. For your case the correct way to handle it is for someone to write a driver that uses the 466747a0-70ec-11de-8a39-0800200c9a66 interface to expose the sensor data. I'm afraid I don't have relevant hardware so can't do this myself, but please do open another bug for that. tl;dr - the kernel message you're seeing is correct. Avoiding it requires a new driver to be written. If you *personally* feel safe in ignoring the risks, you can pass the acpi_enforce_resources=lax option, but that can't be the default because it's unsafe in the general case, and so it isn't the solution to the wider problem. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.