Hi, On 12/2/20 2:49 PM, Rafael J. Wysocki wrote: > On Sat, Nov 21, 2020 at 9:30 PM Hans de Goede <hdegoede@xxxxxxxxxx> wrote: >> >> Hi Rafael, >> >> A while ago (almost 2 years ago) I discussed an issue with you about >> some devices, where some of the methods used during device-addition >> (such as _HID) may rely on OpRegions of other devices: >> >> https://www.spinics.net/lists/linux-acpi/msg86303.html >> >> An example of this is the Acer Switch 10E SW3-016 model. The _HID method >> of the ACPI node for the UART attached Bluetooth, reads GPIOs to detect >> the installed wifi chip and update the _HID for the Bluetooth's ACPI node >> accordingly. The current ACPI scan code calls _HID before the GPIO >> controller's OpRegions are available, leading to the wrong _HID being >> used and Bluetooth not working. >> >> Last week I bought a second hand Acer device, not knowing it was this >> exact model. Since I now have access to affected hardware I decided to >> take a shot at fixing this. >> >> In the discussion you suggested to split the acpi_bus_scan of the root >> into 2 steps, first scan devices with an empty _DEP, putting other >> acpi_handle-s on a list of deferred devices and then in step 2 scan the >> rest. >> >> I'm happy to report that, at least on the affected device, this works >> nicely. While working on this I came up with a less drastic way to >> deal with this. As you will see in patch 4 of this series, I decided >> to first add a more KISS method of deciding which devices to defer >> to the second scan step by matching by HID. This has the disadvantage >> of not being a generic solution. But it has the advantage of being a >> solution which does not potentially regress other devices. >> >> Then in patch 5 I actually do add the option to defer or not based on >> _DEP being empty. I've put this behind a kernel commandline option as >> I'm not sure we should do this everywhere by default. At least no without >> a lot more testing. >> >> Patch 6 fixes an issue with patch 5 which causes battery devices to stop >> working. >> >> And patch 7 adds some extra HIDs to the list of HIDs which should be >> ignored when checking if the _DEP list is empty from Linux' pov, iow >> some extra HIDs which Linux does not bind to. >> >> Please let me know what you think about this patch-set. I would be happy >> to see just patches 1-4 merged. > > I took patches 1 and 2, because IMO they are generally useful (I > rewrote the changelogs to avoid mentioning the rest of the series > though), That is fine. Thanks for taking those. > but I have some reservations regarding the rest. > > First off, I'm not really sure if failing acpi_add_single_object() for > devices with missing dependencies is a good idea. IIRC there is > nothing in there that should depend on any opregions supplied by the > other devices, so it should be safe to allow it to complete. Actually acpi_add_single_object() does depend on ACPI methods which may depend on opregions, that is the whole reason why this series is necessary. Otherwise we could just delay the binding of the driver based in dep_unmet which would be easier. Here are 2 actual examples of acpi_add_single_object() calling ACPI methods which may depend on opregions: 1. acpi_add_single_object() calls acpi_init_device_object() which calls acpi_set_pnp_ids() which fills a bunch if fields of struct acpi_device with info returned by the acpi_get_object_info() call. Specifically it stores the value returned by the _HID method in the acpi_device_pnp array for the device and that _HID method is actually the problem in the example device which started this series. The _HID method of the bluetooth device reads 2 GPIOs to get a hw-id (0-3) and then translates the hwid to a _HID string. If the GPIO opregion's handlers have not registered yet then the reading of the GPIOs is correctly skipped, and hwid 0 is assumed, which is wrong in this case. 2. I've also seen examples where _STA depends on GPIOs in a similar manner; and acpi_add_single_object() calls acpi_bus_get_power_flags() which calls acpi_bus_init_power() which calls acpi_device_is_present() which depends on _STA results. > That, in > turn, will allow the flags in struct acpi_device to be used to mark > the "deferred" devices without allocating more memory. See above, we would need to either redo a bunch of the device addition; and/or potentially even undo it in the case of _STA changing from present -> not present once the OpRegions have registered. > Next, in theory, devices with dependencies may also appear during > hotplug, so it would be prudent to handle that on every invocation of > acpi_bus_scan() and not just when it runs for the root object. > > So my approach would be to allow the first namespace walk in > acpi_bus_scan() to complete, change acpi_bus_attach() to optionally > skip the devices with missing dependencies and return a result > indicating whether or not it has set flags.visited for any devices and > run it in a loop on the "root" device object until it says that no new > devices have been "attached". > > Let me cut a prototype patch for that and get back to you. See above for why the prototype patch will not work. By the time acpi_bus_attach() runs, the wrong HID has already been stored by acpi_set_pnp_ids(). Note I'm not saying that your approach cannot work, we could e.g. re-fetch the HID before running bus_attach. But once we start doing extra work, replacing fields in the earlier instantiated acpi_device, then there is not that much difference between the 2 approaches and then the question becomes which way is cleaner ? I still favor my own approach because that simply delays the entire instantiation, rather then doing it when we don't have all the _DEPs yet and then "patching up" the acpi_device later. Note I did consider the "patching up" approach, but I rejected it (*). The patching-up approach feels fragile / more error-prone to me, which is why I chose to simply defer the entire instantiation. Regards, Hans *) In hindsight I should have probably written that down somewhere, but the whole though process behind my choices it is not something which I could quickly describe in 1 or 2 paragraphs in the commit message. Some other notes about this: 1. I considered the patching-up approach, but I did not implement it. 2. Another reason for delaying the entire instantiation is that we have some quirk handling in various places which relies on _HID / on the pnp-ids and since those are still in flux when the deps have not been met yet, some of that quirk handling could (theoretically) also go wonky if we instantiate the device too soon.