On 2019-07-02 11:49 a.m., Timur Kristóf wrote: > On Tue, 2019-07-02 at 10:09 +0200, Michel Dänzer wrote: >> On 2019-07-01 6:01 p.m., Timur Kristóf wrote: >>> On Mon, 2019-07-01 at 16:54 +0200, Michel Dänzer wrote: >>>> On 2019-06-28 2:21 p.m., Timur Kristóf wrote: >>>>> I haven't found a good way to measure the maximum PCIe >>>>> throughput >>>>> between the CPU and GPU, >>>> >>>> amdgpu.benchmark=3 >>>> >>>> on the kernel command line will measure throughput for various >>>> transfer >>>> sizes during driver initialization. >>> >>> Thanks, I will definitely try that. >>> Is this the only way to do this, or is there a way to benchmark it >>> after it already booted? >> >> The former. At least in theory, it's possible to unload the amdgpu >> module while nothing is using it, then load it again. > > Okay, so I booted my system with amdgpu.benchmark=3 > You can find the full dmesg log here: https://pastebin.com/zN9FYGw4 > > The result is between 1-5 Gbit / sec depending on the transfer size > (the higher the better), which corresponds to neither the 8 Gbit / sec > that the kernel thinks it is limited to, nor the 20 Gbit / sec which I > measured earlier with pcie_bw. 5 Gbit/s throughput could be consistent with 8 Gbit/s theoretical bandwidth, due to various overhead. > Since pcie_bw only shows the maximum PCIe packet size (and not the > actual size), could it be that it's so inaccurate that the 20 Gbit / > sec is a fluke? Seems likely or at least plausible. >>>>> but I did take a look at AMD's sysfs interface at >>>>> /sys/class/drm/card1/device/pcie_bw which while running the >>>>> bottlenecked >>>>> game. The highest throughput I saw there was only 2.43 Gbit >>>>> /sec. >>>> >>>> PCIe bandwidth generally isn't a bottleneck for games, since they >>>> don't >>>> constantly transfer large data volumes across PCIe, but store >>>> them in >>>> the GPU's local VRAM, which is connected at much higher >>>> bandwidth. >>> >>> There are reasons why I think the problem is the bandwidth: >>> 1. The same issues don't happen when the GPU is not used with a TB3 >>> enclosure. >>> 2. In case of radeonsi, the problem was mitigated once Marek's SDMA >>> patch was merged, which hugely reduces the PCIe bandwidth use. >>> 3. In less optimized cases (for example D9VK), the problem is still >>> very noticable. >> >> However, since you saw as much as ~20 Gbit/s under different >> circumstances, the 2.43 Gbit/s used by this game clearly isn't a hard >> limit; there must be other limiting factors. > > There may be other factors, yes. I can't offer a good explanation on > what exactly is happening, but it's pretty clear that amdgpu can't take > full advantage of the TB3 link, so it seemed like a good idea to start > investigating this first. Yeah, actually it would be consistent with ~16-32 KB granularity transfers based on your measurements above, which is plausible. So making sure that the driver doesn't artificially limit the PCIe bandwidth might indeed help. OTOH this also indicates a similar potential for improvement by using larger transfers in Mesa and/or the kernel. -- Earthling Michel Dänzer | https://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel