Re: [PATCH 1/2] drm/amdgpu: return the PCIe gen and lanes from the INFO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There is no hole on 32-bit unfortunately. It looks like the hole on 64-bit is now ABI.

I moved the field to replace _pad1. The patch is attached (with your Rb).

Marek

On Fri, Jan 13, 2023 at 4:20 PM Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
On Fri, Jan 13, 2023 at 4:02 PM Marek Olšák <maraeo@xxxxxxxxx> wrote:
>
> i've added the comments and indeed pahole shows the hole as expected.

What about on 32-bit?

Alex

>
> Marek
>
> On Thu, Jan 12, 2023 at 11:44 AM Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
>>
>> On Thu, Jan 12, 2023 at 6:50 AM Christian König
>> <christian.koenig@xxxxxxx> wrote:
>> >
>> > Am 11.01.23 um 21:48 schrieb Alex Deucher:
>> > > On Wed, Jan 4, 2023 at 3:17 PM Marek Olšák <maraeo@xxxxxxxxx> wrote:
>> > >> Yes, it's meant to be like a spec sheet. We are not interested in the current bandwidth utilization.
>> > > After chatting with Marek on IRC and thinking about this more, I think
>> > > this patch is fine.  It's not really meant for bandwidth per se, but
>> > > rather as a limit to determine what the driver should do in certain
>> > > cases (i.e., when does it make sense to copy to vram vs not).  It's
>> > > not straightforward for userspace to parse the full topology to
>> > > determine what links may be slow.  I guess one potential pitfall would
>> > > be that if you pass the device into a VM, the driver may report the
>> > > wrong values.  Generally in a VM the VM doesn't get the full view up
>> > > to the root port.  I don't know if the hypervisors report properly for
>> > > pcie_bandwidth_available() in a VM or if it just shows the info about
>> > > the endpoint in the VM.
>> >
>> > So this basically doesn't return the gen and lanes of the device, but
>> > rather what was negotiated between the device and the upstream root port?
>>
>> Correct. It exposes the max gen and lanes of the slowest link between
>> the device and the root port.
>>
>> >
>> > If I got that correctly then we should probably document that cause
>> > otherwise somebody will try to "fix" it at some time.
>>
>> Good point.
>>
>> Alex
>>
>> >
>> > Christian.
>> >
>> > >
>> > > Reviewed-by: Alex Deucher <alexander.deucher@xxxxxxx>
>> > >
>> > > Alex
>> > >
>> > >> Marek
>> > >>
>> > >> On Wed, Jan 4, 2023 at 10:33 AM Lazar, Lijo <Lijo.Lazar@xxxxxxx> wrote:
>> > >>> [AMD Official Use Only - General]
>> > >>>
>> > >>>
>> > >>> To clarify, with DPM in place, the current bandwidth will be changing based on the load.
>> > >>>
>> > >>> If apps/umd already has a way to know the current bandwidth utilisation, then possible maximum also could be part of the same API. Otherwise, this only looks like duplicate information. We have the same information in sysfs DPM nodes.
>> > >>>
>> > >>> BTW, I don't know to what extent app/umd really makes use of this. Take that memory frequency as an example (I'm reading it as 16GHz). It only looks like a spec sheet.
>> > >>>
>> > >>> Thanks,
>> > >>> Lijo
>> > >>> ________________________________
>> > >>> From: Marek Olšák <maraeo@xxxxxxxxx>
>> > >>> Sent: Wednesday, January 4, 2023 8:40:00 PM
>> > >>> To: Lazar, Lijo <Lijo.Lazar@xxxxxxx>
>> > >>> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
>> > >>> Subject: Re: [PATCH 1/2] drm/amdgpu: return the PCIe gen and lanes from the INFO
>> > >>>
>> > >>> On Wed, Jan 4, 2023 at 9:19 AM Lazar, Lijo <lijo.lazar@xxxxxxx> wrote:
>> > >>>
>> > >>>
>> > >>>
>> > >>> On 1/4/2023 7:43 PM, Marek Olšák wrote:
>> > >>>> On Wed, Jan 4, 2023 at 6:50 AM Lazar, Lijo <lijo.lazar@xxxxxxx
>> > >>>> <mailto:lijo.lazar@xxxxxxx>> wrote:
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>      On 1/4/2023 4:11 AM, Marek Olšák wrote:
>> > >>>>       > I see. Well, those sysfs files are not usable, and I don't think it
>> > >>>>       > would be important even if they were usable, but for completeness:
>> > >>>>       >
>> > >>>>       > The ioctl returns:
>> > >>>>       >      pcie_gen = 1
>> > >>>>       >      pcie_num_lanes = 16
>> > >>>>       >
>> > >>>>       > Theoretical bandwidth from those values: 4.0 GB/s
>> > >>>>       > My DMA test shows this write bandwidth: 3.5 GB/s
>> > >>>>       > It matches the expectation.
>> > >>>>       >
>> > >>>>       > Let's see the devices (there is only 1 GPU Navi21 in the system):
>> > >>>>       > $ lspci |egrep '(PCI|VGA).*Navi'
>> > >>>>       > 0a:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi
>> > >>>>      10 XL
>> > >>>>       > Upstream Port of PCI Express Switch (rev c3)
>> > >>>>       > 0b:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi
>> > >>>>      10 XL
>> > >>>>       > Downstream Port of PCI Express Switch
>> > >>>>       > 0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
>> > >>>>       > [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c3)
>> > >>>>       >
>> > >>>>       > Let's read sysfs:
>> > >>>>       >
>> > >>>>       > $ cat /sys/bus/pci/devices/0000:0a:00.0/current_link_width
>> > >>>>       > 16
>> > >>>>       > $ cat /sys/bus/pci/devices/0000:0b:00.0/current_link_width
>> > >>>>       > 16
>> > >>>>       > $ cat /sys/bus/pci/devices/0000:0c:00.0/current_link_width
>> > >>>>       > 16
>> > >>>>       > $ cat /sys/bus/pci/devices/0000:0a:00.0/current_link_speed
>> > >>>>       > 2.5 GT/s PCIe
>> > >>>>       > $ cat /sys/bus/pci/devices/0000:0b:00.0/current_link_speed
>> > >>>>       > 16.0 GT/s PCIe
>> > >>>>       > $ cat /sys/bus/pci/devices/0000:0c:00.0/current_link_speed
>> > >>>>       > 16.0 GT/s PCIe
>> > >>>>       >
>> > >>>>       > Problem 1: None of the speed numbers match 4 GB/s.
>> > >>>>
>> > >>>>      US bridge = 2.5GT/s means operating at PCIe Gen 1 speed. Total
>> > >>>>      theoretical bandwidth is then derived based on encoding and total
>> > >>>>      number
>> > >>>>      of lanes.
>> > >>>>
>> > >>>>       > Problem 2: Userspace doesn't know the bus index of the bridges,
>> > >>>>      and it's
>> > >>>>       > not clear which bridge should be used.
>> > >>>>
>> > >>>>      In general, modern ones have this arch= US->DS->EP. US is the one
>> > >>>>      connected to physical link.
>> > >>>>
>> > >>>>       > Problem 3: The PCIe gen number is missing.
>> > >>>>
>> > >>>>      Current link speed is based on whether it's Gen1/2/3/4/5.
>> > >>>>
>> > >>>>      BTW, your patch makes use of capabilities flags which gives the maximum
>> > >>>>      supported speed/width by the device. It may not necessarily reflect the
>> > >>>>      current speed/width negotiated. I guess in NV, this info is already
>> > >>>>      obtained from PMFW and made available through metrics table.
>> > >>>>
>> > >>>>
>> > >>>> It computes the minimum of the device PCIe gen and the motherboard/slot
>> > >>>> PCIe gen to get the final value. These 2 lines do that. The low 16 bits
>> > >>>> of the mask contain the device PCIe gen mask. The high 16 bits of the
>> > >>>> mask contain the slot PCIe gen mask.
>> > >>>> + pcie_gen_mask = adev->pm.pcie_gen_mask & (adev->pm.pcie_gen_mask >> 16);
>> > >>>> + dev_info->pcie_gen = fls(pcie_gen_mask);
>> > >>>>
>> > >>> With DPM in place on some ASICs, how much does this static info help for
>> > >>> upper level apps?
>> > >>>
>> > >>>
>> > >>> It helps UMDs make better decisions if they know the maximum achievable bandwidth. UMDs also compute the maximum memory bandwidth and compute performance (FLOPS). Right now it's printed by Mesa to give users detailed information about their GPU. For example:
>> > >>>
>> > >>> $ AMD_DEBUG=info glxgears
>> > >>> Device info:
>> > >>>      name = NAVI21
>> > >>>      marketing_name = AMD Radeon RX 6800
>> > >>>      num_se = 3
>> > >>>      num_rb = 12
>> > >>>      num_cu = 60
>> > >>>      max_gpu_freq = 2475 MHz
>> > >>>      max_gflops = 19008 GFLOPS
>> > >>>      l0_cache_size = 16 KB
>> > >>>      l1_cache_size = 128 KB
>> > >>>      l2_cache_size = 4096 KB
>> > >>>      l3_cache_size = 128 MB
>> > >>>      memory_channels = 16 (TCC blocks)
>> > >>>      memory_size = 16 GB (16384 MB)
>> > >>>      memory_freq = 16 GHz
>> > >>>      memory_bus_width = 256 bits
>> > >>>      memory_bandwidth = 512 GB/s
>> > >>>      pcie_gen = 1
>> > >>>      pcie_num_lanes = 16
>> > >>>      pcie_bandwidth = 4.0 GB/s
>> > >>>
>> > >>> Marek
>> >
From 6220395fb0b9c10c92ea67b80e09120e6f92a499 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak@xxxxxxx>
Date: Sat, 24 Dec 2022 17:44:26 -0500
Subject: [PATCH] drm/amdgpu: return the PCIe gen and lanes from the INFO ioctl
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

For computing PCIe bandwidth in userspace and troubleshooting PCIe
bandwidth issues.

For example, my Navi21 has been limited to PCIe gen 1 and this is
the first time I noticed it after 2 years.

Note that this intentionally fills a hole and padding
in drm_amdgpu_info_device.

Signed-off-by: Marek Olšák <marek.olsak@xxxxxxx>
Reviewed-by: Alex Deucher <alexander.deucher@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 14 +++++++++++++-
 include/uapi/drm/amdgpu_drm.h           |  6 ++++--
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 06aba201d4db..a75dba2caeca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -106,9 +106,10 @@
  * - 3.49.0 - Add gang submit into CS IOCTL
  * - 3.50.0 - Update AMDGPU_INFO_DEV_INFO IOCTL for minimum engine and memory clock
  *            Update AMDGPU_INFO_SENSOR IOCTL for PEAK_PSTATE engine and memory clock
+ *   3.51.0 - Return the PCIe gen and lanes from the INFO ioctl
  */
 #define KMS_DRIVER_MAJOR	3
-#define KMS_DRIVER_MINOR	50
+#define KMS_DRIVER_MINOR	51
 #define KMS_DRIVER_PATCHLEVEL	0
 
 unsigned int amdgpu_vram_limit = UINT_MAX;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 903e8770e275..fba306e0ef87 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -42,6 +42,7 @@
 #include "amdgpu_gem.h"
 #include "amdgpu_display.h"
 #include "amdgpu_ras.h"
+#include "amd_pcie.h"
 
 void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
@@ -766,6 +767,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 	case AMDGPU_INFO_DEV_INFO: {
 		struct drm_amdgpu_info_device *dev_info;
 		uint64_t vm_size;
+		uint32_t pcie_gen_mask;
 		int ret;
 
 		dev_info = kzalloc(sizeof(*dev_info), GFP_KERNEL);
@@ -798,7 +800,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 		dev_info->num_rb_pipes = adev->gfx.config.max_backends_per_se *
 			adev->gfx.config.max_shader_engines;
 		dev_info->num_hw_gfx_contexts = adev->gfx.config.max_hw_contexts;
-		dev_info->_pad = 0;
 		dev_info->ids_flags = 0;
 		if (adev->flags & AMD_IS_APU)
 			dev_info->ids_flags |= AMDGPU_IDS_FLAGS_FUSION;
@@ -852,6 +853,17 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 
 		dev_info->tcc_disabled_mask = adev->gfx.config.tcc_disabled_mask;
 
+		/* Combine the chip gen mask with the platform (CPU/mobo) mask. */
+		pcie_gen_mask = adev->pm.pcie_gen_mask & (adev->pm.pcie_gen_mask >> 16);
+		dev_info->pcie_gen = fls(pcie_gen_mask);
+		dev_info->pcie_num_lanes =
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X32 ? 32 :
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X16 ? 16 :
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X12 ? 12 :
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X8 ? 8 :
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X4 ? 4 :
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X2 ? 2 : 1;
+
 		ret = copy_to_user(out, dev_info,
 				   min((size_t)size, sizeof(*dev_info))) ? -EFAULT : 0;
 		kfree(dev_info);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index fe7f871e3080..973af6d06626 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -1053,7 +1053,8 @@ struct drm_amdgpu_info_device {
 	__u32 enabled_rb_pipes_mask;
 	__u32 num_rb_pipes;
 	__u32 num_hw_gfx_contexts;
-	__u32 _pad;
+	/* PCIe version (the smaller of the GPU and the CPU/motherboard) */
+	__u32 pcie_gen;
 	__u64 ids_flags;
 	/** Starting virtual address for UMDs. */
 	__u64 virtual_address_offset;
@@ -1100,7 +1101,8 @@ struct drm_amdgpu_info_device {
 	__u32 gs_prim_buffer_depth;
 	/* max gs wavefront per vgt*/
 	__u32 max_gs_waves_per_vgt;
-	__u32 _pad1;
+	/* PCIe number of lanes (the smaller of the GPU and the CPU/motherboard) */
+	__u32 pcie_num_lanes;
 	/* always on cu bitmap */
 	__u32 cu_ao_bitmap[4][4];
 	/** Starting high virtual address for UMDs. */
-- 
2.34.1


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux