Re: [PATCH] drm/amdgpu: Higher log level for missing PCIe atomics caps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-03-19 03:02, Daisuke Matsuda (Fujitsu) wrote:
On Tue, Mar 18, 2025 5:35 AM Felix Kuehling wrote:
On 2025-03-17 15:07, Deucher, Alexander wrote:
[Public]

-----Original Message-----
From: Daisuke Matsuda <matsuda-daisuke@xxxxxxxxxxx>
Sent: Thursday, March 13, 2025 9:18 PM
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx; Deucher,
Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian
<Christian.Koenig@xxxxxxx>
Cc: airlied@xxxxxxxxx; simona@xxxxxxxx; Daisuke Matsuda <matsuda-
daisuke@xxxxxxxxxxx>
Subject: [PATCH] drm/amdgpu: Higher log level for missing PCIe atomics caps

Currently, ROCm requires CPUs that support PCIe atomics. The message is more
urgent for GPGPU users, meaning basic functionalities of ROCm are not available
on the node.

+ Felix

My understanding is that PCIe atomics are not strictly required, but there are some features that are not available
without them.  Warning seems a bit overkill and potentially confusing to users that have an existing system that
otherwise works fine.

I could see either argument. The GPU is capable of PCIe atomics, but the system configuration (chipset, PCIe switch, etc.)
is preventing them from working, which has an impact on some application-visible functionality. A naive application that
depends on atomics and is not carefully written to provide fallbacks or fail gracefully in case atomics are unavailable,
would fail silently or produce incorrect results. So maybe that justifies a warning message.
The Linux kernel commonly uses "warn level" messages to alert users when an incompatibility restricts functionality, and this approach aligns with that practice.
To address the concern about the message being too general, how about making it more specific like this:
   dev_warn(adev->dev, "PCIe atomic ops are not supported.  ROCm performance and functionality may be limited.\n");
This is more direct and focused on the impact to ROCm users.

I think it's not a performance problem. There is no general lower-performing fallback I'm aware of. The functionality impact is limited to kernels that try to do fine-grained synchronization with atomic operations to system memory. So I'd be wary of making the message too scary without giving any detail about how to avoid problems. Anyone who reads this message and doesn't know what PCIe atomics are, is probably not affected.

Regards,
  Felix



Regards,
Daisuke

Regards,
   Felix


Alex


Signed-off-by: Daisuke Matsuda <matsuda-daisuke@xxxxxxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 4 ++--
drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 018dfccd771b..faeef136e272 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4374,7 +4374,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
                       return r;
       }

-     /* enable PCIE atomic ops */
+     /* enable PCIe atomic ops */
       if (amdgpu_sriov_vf(adev)) {
               if (adev->virt.fw_reserve.p_pf2vf)
                       adev->have_atomics_support = ((struct
amd_sriov_msg_pf2vf_info *) @@ -4395,7 +4395,7 @@ int
amdgpu_device_init(struct amdgpu_device *adev,
       }

       if (!adev->have_atomics_support)
-             dev_info(adev->dev, "PCIE atomic ops is not supported\n");
+             dev_warn(adev->dev, "PCIe atomic ops are not supported\n");

       /* doorbell bar mapping and doorbell index init*/
       amdgpu_doorbell_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
index b4f9c2f4e92c..c52605a07597 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
@@ -240,7 +240,7 @@ struct amd_sriov_msg_pf2vf_info {
       } mm_bw_management[AMD_SRIOV_MSG_RESERVE_VCN_INST];
       /* UUID info */
       struct amd_sriov_msg_uuid_info uuid_info;
-     /* PCIE atomic ops support flag */
+     /* PCIe atomic ops support flag */
       uint32_t pcie_atomic_ops_support_flags;
       /* Portion of GPU memory occupied by VF.  MAX value is 65535, but set to
uint32_t to maintain alignment with reserved size */
       uint32_t gpu_capacity;
--
2.39.1



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux