Re: [PATCH] drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/11/2024 16:16, Gabriel Marcano wrote:
  >On 12/11/2024 15:41, Alex Deucher wrote:
On Wed, Dec 11, 2024 at 3:19 PM Mario Limonciello <superm1@xxxxxxxxxx> wrote:

On 12/11/2024 14:08, Alex Deucher wrote:
On Wed, Dec 11, 2024 at 10:56 AM Mario Limonciello <superm1@xxxxxxxxxx> wrote:

From: Mario Limonciello <mario.limonciello@xxxxxxx>

If the kernel hasn't been compiled with PCIe hotplug support this
can lead to problems with dGPUs that use BOCO because they effectively
drop off the bus.

To prevent issues, disable BOCO support when compiled without PCIe hotplug.

Reported-by: Gabriel Marcano <gabemarcano@xxxxxxxxx>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>

Acked-by: Alex Deucher <alexander.deucher@xxxxxxx>

Thx.


Seems like this should affect any device which supports d3cold.  Maybe
we want something more general as well?

Any specific ideas?  One of these two hunks I think make sense, leaning
upon the second one more strongly.
Actually, I wonder if the affected hardware pre-dates d3cold and uses
the old proprietary AMD ATPX interface to control dGPU power.  In that
case, the d3cold is managed by the driver rather than the PCI/ACPI
subsystems.  IIRC, there was a workaround in the PCIe hotplug code to
avoid calling the pci remove function when the driver powered down the
GPU via ATPX (or the nvidia equivalent).  If so, this check should go
in amdgpu_device_supports_px() instead.

Gabriel,

Can you please share a full kernel log so we can clarify which method
your hardware uses?


Sure thing. I am attaching a kernel output from last night (it actually crashed
what looks to be the renoir APU as I tried to turn off the computer, which
shows up in the logs towards the end).

Some caveats about my system:
  - I'm using some modified ACPI tables:
    - I've tweaked some WMI-related WMAX code (read/write GPIO for RGB controller)
    - I've fixed a missing symbol issue (renamed _EC0 to __EC)
    - Fixed a bunch of other warnings reported by iasl
  - I have `#define DEBUG 1` in amdgpu_drv.c
  - I have a patch from https://bugzilla.kernel.org/show_bug.cgi?id=215884
    applied
  - My kernel is using Gentoo patches

Looking at my dmesg output, it looks like I'm using ATPX:
[  +0.000022] amdgpu: vga_switcheroo: detected switching method
                       \_SB_.PCI0.GP17.VGA_.ATPX handle
[  +0.001561] amdgpu: ATPX version 1, functions 0x00000001
[  +0.000120] amdgpu: ATPX Hybrid Graphics

Also I see this in my ACPI table dissasembly:
   Scope (\_SB.PCI0.GP17.VGA)
   {
       Name (M189, Buffer (0x0100){})
       Name (M190, Ones)
       Name (M191, Ones)
       Method (ATPX, 2, Serialized)
       {


If you need me to recompile the kernel and/or disable my changes to my ACPI
tables, let me know.

Your log also reports this though:

amdgpu 0000:03:00.0: amdgpu: Using BOCO for runtime pm



Thanks,

Gabriel


Thanks,
Alex



                                     diff --git a/drivers/pci/pci.c
b/drivers/pci/pci.c
index 0b29ec6e8e5e2..01691f1d26fbe 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2751,6 +2751,10 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
           if (target_state == PCI_POWER_ERROR)
                   return -EIO;

+       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
+           target_state == PCI_D3cold)
+               return -EBUSY;
+
           pci_enable_wake(dev, target_state, wakeup);

           error = pci_set_power_state(dev, target_state);
@@ -2797,6 +2801,10 @@ int pci_finish_runtime_suspend(struct pci_dev *dev)
           if (target_state == PCI_POWER_ERROR)
                   return -EIO;

+       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
+            target_state == PCI_D3cold)
+               return -EBUSY;
+
           __pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));

           error = pci_set_power_state(dev, target_state);

Alex

---
    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
    1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 764d41434a82f..7db796ebb967e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -419,6 +419,9 @@ bool amdgpu_device_supports_boco(struct drm_device *dev)
    {
           struct amdgpu_device *adev = drm_to_adev(dev);

+       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
+               return false;
+
           if (adev->has_pr3 ||
               ((adev->flags & AMD_IS_PX) && amdgpu_is_atpx_hybrid()))
                   return true;
--
2.43.0








[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux