Re: [PATCH] drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/11/2024 15:43, Mario Limonciello wrote:
On 12/11/2024 15:41, Alex Deucher wrote:
On Wed, Dec 11, 2024 at 3:19 PM Mario Limonciello <superm1@xxxxxxxxxx> wrote:

On 12/11/2024 14:08, Alex Deucher wrote:
On Wed, Dec 11, 2024 at 10:56 AM Mario Limonciello <superm1@xxxxxxxxxx> wrote:

From: Mario Limonciello <mario.limonciello@xxxxxxx>

If the kernel hasn't been compiled with PCIe hotplug support this
can lead to problems with dGPUs that use BOCO because they effectively
drop off the bus.

To prevent issues, disable BOCO support when compiled without PCIe hotplug.

Reported-by: Gabriel Marcano <gabemarcano@xxxxxxxxx>
Closes: https://gitlab.freedesktop.org/drm/amd/-/ issues/1707#note_2696862
Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>

Acked-by: Alex Deucher <alexander.deucher@xxxxxxx>

Thx.


Seems like this should affect any device which supports d3cold.  Maybe
we want something more general as well?

Any specific ideas?  One of these two hunks I think make sense, leaning
upon the second one more strongly.

Actually, I wonder if the affected hardware pre-dates d3cold and uses
the old proprietary AMD ATPX interface to control dGPU power.  In that
case, the d3cold is managed by the driver rather than the PCI/ACPI
subsystems.  IIRC, there was a workaround in the PCIe hotplug code to
avoid calling the pci remove function when the driver powered down the
GPU via ATPX (or the nvidia equivalent).  If so, this check should go
in amdgpu_device_supports_px() instead.

Gabriel,

Can you please share a full kernel log so we can clarify which method your hardware uses?

Actually he shared it to the bug.  It's BOCO.

https://gitlab.freedesktop.org/-/project/4522/uploads/5714f4562de8ffeb7d38f70eb7d51d85/dmesg_with_flip_done

Dec 10 22:45:07 localhost kernel: amdgpu 0000:03:00.0: amdgpu: Using BOCO for runtime pm


Thanks,

Alex




                                    diff --git a/drivers/pci/pci.c
b/drivers/pci/pci.c
index 0b29ec6e8e5e2..01691f1d26fbe 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2751,6 +2751,10 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
          if (target_state == PCI_POWER_ERROR)
                  return -EIO;

+       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
+           target_state == PCI_D3cold)
+               return -EBUSY;
+
          pci_enable_wake(dev, target_state, wakeup);

          error = pci_set_power_state(dev, target_state);
@@ -2797,6 +2801,10 @@ int pci_finish_runtime_suspend(struct pci_dev *dev)
          if (target_state == PCI_POWER_ERROR)
                  return -EIO;

+       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
+            target_state == PCI_D3cold)
+               return -EBUSY;
+
          __pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));

          error = pci_set_power_state(dev, target_state);

Alex

---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/ gpu/drm/amd/amdgpu/amdgpu_device.c
index 764d41434a82f..7db796ebb967e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -419,6 +419,9 @@ bool amdgpu_device_supports_boco(struct drm_device *dev)
   {
          struct amdgpu_device *adev = drm_to_adev(dev);

+       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
+               return false;
+
          if (adev->has_pr3 ||
              ((adev->flags & AMD_IS_PX) && amdgpu_is_atpx_hybrid()))
                  return true;
--
2.43.0







[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux