Re: [PATCH] drm/amd: Require CONFIG_HOTPLUG_PCI_PCIE for BOCO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 >On 12/11/2024 15:41, Alex Deucher wrote:
>> On Wed, Dec 11, 2024 at 3:19 PM Mario Limonciello <superm1@xxxxxxxxxx> wrote:
>>>
>>> On 12/11/2024 14:08, Alex Deucher wrote:
>>>> On Wed, Dec 11, 2024 at 10:56 AM Mario Limonciello <superm1@xxxxxxxxxx> wrote:
>>>>>
>>>>> From: Mario Limonciello <mario.limonciello@xxxxxxx>
>>>>>
>>>>> If the kernel hasn't been compiled with PCIe hotplug support this
>>>>> can lead to problems with dGPUs that use BOCO because they effectively
>>>>> drop off the bus.
>>>>>
>>>>> To prevent issues, disable BOCO support when compiled without PCIe hotplug.
>>>>>
>>>>> Reported-by: Gabriel Marcano <gabemarcano@xxxxxxxxx>
>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707#note_2696862
>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
>>>>
>>>> Acked-by: Alex Deucher <alexander.deucher@xxxxxxx>
>>>
>>> Thx.
>>>
>>>>
>>>> Seems like this should affect any device which supports d3cold.  Maybe
>>>> we want something more general as well?
>>>
>>> Any specific ideas?  One of these two hunks I think make sense, leaning
>>> upon the second one more strongly.
>>  
>> Actually, I wonder if the affected hardware pre-dates d3cold and uses
>> the old proprietary AMD ATPX interface to control dGPU power.  In that
>> case, the d3cold is managed by the driver rather than the PCI/ACPI
>> subsystems.  IIRC, there was a workaround in the PCIe hotplug code to
>> avoid calling the pci remove function when the driver powered down the
>> GPU via ATPX (or the nvidia equivalent).  If so, this check should go
>> in amdgpu_device_supports_px() instead.
>
>Gabriel,
>
>Can you please share a full kernel log so we can clarify which method  
>your hardware uses?
>

Sure thing. I am attaching a kernel output from last night (it actually crashed
what looks to be the renoir APU as I tried to turn off the computer, which
shows up in the logs towards the end).

Some caveats about my system:
 - I'm using some modified ACPI tables:
   - I've tweaked some WMI-related WMAX code (read/write GPIO for RGB controller)
   - I've fixed a missing symbol issue (renamed _EC0 to __EC)
   - Fixed a bunch of other warnings reported by iasl
 - I have `#define DEBUG 1` in amdgpu_drv.c
 - I have a patch from https://bugzilla.kernel.org/show_bug.cgi?id=215884
   applied
 - My kernel is using Gentoo patches

Looking at my dmesg output, it looks like I'm using ATPX:
[  +0.000022] amdgpu: vga_switcheroo: detected switching method
                      \_SB_.PCI0.GP17.VGA_.ATPX handle
[  +0.001561] amdgpu: ATPX version 1, functions 0x00000001
[  +0.000120] amdgpu: ATPX Hybrid Graphics

Also I see this in my ACPI table dissasembly:
  Scope (\_SB.PCI0.GP17.VGA)
  {
      Name (M189, Buffer (0x0100){})
      Name (M190, Ones)
      Name (M191, Ones)
      Method (ATPX, 2, Serialized)
      {


If you need me to recompile the kernel and/or disable my changes to my ACPI
tables, let me know.

Thanks,

Gabriel


>Thanks,
>>  
>> Alex
>>  
>>>
>>>
>>>
>>>                                     diff --git a/drivers/pci/pci.c
>>> b/drivers/pci/pci.c
>>> index 0b29ec6e8e5e2..01691f1d26fbe 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -2751,6 +2751,10 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
>>>           if (target_state == PCI_POWER_ERROR)
>>>                   return -EIO;
>>>
>>> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
>>> +           target_state == PCI_D3cold)
>>> +               return -EBUSY;
>>> +
>>>           pci_enable_wake(dev, target_state, wakeup);
>>>
>>>           error = pci_set_power_state(dev, target_state);
>>> @@ -2797,6 +2801,10 @@ int pci_finish_runtime_suspend(struct pci_dev *dev)
>>>           if (target_state == PCI_POWER_ERROR)
>>>                   return -EIO;
>>>
>>> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE) &&
>>> +            target_state == PCI_D3cold)
>>> +               return -EBUSY;
>>> +
>>>           __pci_enable_wake(dev, target_state, pci_dev_run_wake(dev));
>>>
>>>           error = pci_set_power_state(dev, target_state);
>>>>
>>>> Alex
>>>>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
>>>>>    1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> index 764d41434a82f..7db796ebb967e 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> @@ -419,6 +419,9 @@ bool amdgpu_device_supports_boco(struct drm_device *dev)
>>>>>    {
>>>>>           struct amdgpu_device *adev = drm_to_adev(dev);
>>>>>
>>>>> +       if (!IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE))
>>>>> +               return false;
>>>>> +
>>>>>           if (adev->has_pr3 ||
>>>>>               ((adev->flags & AMD_IS_PX) && amdgpu_is_atpx_hybrid()))
>>>>>                   return true;
>>>>> --
>>>>> 2.43.0
>>>>>
>>>


Attachment: dmesg_with_flip_done
Description: Binary data


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux