On 9/30/2023 15:18, Sebastian Reichel wrote:
Hi,
On Tue, Sep 26, 2023 at 05:59:54PM -0500, Mario Limonciello wrote:
On some systems AMD Navi3x dGPU triggers RAS errors on startup; but
only if the amdgpu kernel module is not part of the initramfs.
This is because the hardware is not properly programmed for the
AC/DC state of the system when it is loaded later in boot.
I don't understand the last sentence. As far as I can see
i2c_dw_pci_probe() either does not registers UCSI at all or
with the dGPU properties (and thus scope) set.
I'll explain it better below.
The AC/DC state of the system is incorrect specifically when UCSI power
supplies have been initialized. These power supplies are marked as
POWER_SUPPLY_SCOPE_UNKNOWN scope. As they're 'offline' the power
supply count is increased but the resultant return value is
power_supply_is_system_supplied() 0.
To fix this look explicitly for `POWER_SUPPLY_SCOPE_SYSTEM` power
supplies before incrementing the count. If no system power supply
is found then the system is assumed to be on AC.
Cc: stable@xxxxxxxxxxxxxxx
Tested-by: David Perry <David.Perry@xxxxxxx>
Fixes: 95339f40a8b6 ("power: supply: Fix logic checking if system is running from battery")
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
---
This effectively fully disables supply detection for UCSI, because
it is never set to POWER_SUPPLY_SCOPE_SYSTEM. Please fix the amdgpu
init part instead.
I don't think my commit message did a good job conveying why this is a
core bug. Let me try to add more detail.
This is an OEM system that has 3 USB type C ports. It's an Intel
system, but this doesn't matter for the issue.
* when ucsi_acpi is not loaded there are no power supplies in the system
and it reports power_supply_is_system_supplied() as AC.
* When ucsi_acpi is loaded 3 power supplies will be registered.
power_supply_is_system_supplied() reports as DC.
Now when you add in a Navi3x AMD dGPU to the system the power supplies
don't change. This particular dGPU model doesn't contain a USB-C port,
so there is no UCSI power supply registered.
As amdgpu is loaded it looks at device initialization whether the system
is powered by AC or DC. Here is how it looks:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?h=linux-6.5.y#n3834
On the OEM system if amdgpu loads before the ucsi_acpi driver (such as
in the initramfs) then the right value is returned for
power_supply_is_system_supplied() - AC.
If amdgpu is loaded after the ucsi_acpi driver, the wrong value is
returned for power_supply_is_system_supplied() - DC.
This value is very important to set up the dGPU properly. If the wrong
value is returned, the wrong value will be notified to the hardware and
the hardware will not behave properly. On the OEM system this is a
"black screen" at bootup along with RAS errors emitted by the dGPU.
With no changes to a malfunctioning kernel or initramfs binaries I can
add modprobe.blacklist=ucsi_acpi to kernel command line avoid
registering those 3 power supplies and the system behaves properly.
So I think it's inappropriate for "UNKNOWN" scope power supplies to be
registered and treated as system supplies, at least as it pertains to
power_supply_is_system_supplied().
-- Sebastian
drivers/power/supply/power_supply_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c
index d325e6dbc770..3de6e6d00815 100644
--- a/drivers/power/supply/power_supply_core.c
+++ b/drivers/power/supply/power_supply_core.c
@@ -349,7 +349,7 @@ static int __power_supply_is_system_supplied(struct device *dev, void *data)
unsigned int *count = data;
if (!psy->desc->get_property(psy, POWER_SUPPLY_PROP_SCOPE, &ret))
- if (ret.intval == POWER_SUPPLY_SCOPE_DEVICE)
+ if (ret.intval != POWER_SUPPLY_SCOPE_SYSTEM)
return 0;
(*count)++;
--
2.34.1