[Bug 219611] New: Read of pcie_bw sysfs file on AMD GPU blocks for 1 second

bugzilla-daemon@xxxxxxxxxx · Wed, 18 Dec 2024 14:49:58 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=219611

            Bug ID: 219611
           Summary: Read of pcie_bw sysfs file on AMD GPU blocks for 1
                    second
           Product: Drivers
           Version: 2.5
          Hardware: Intel
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@xxxxxxxxxxxxxxxxxxxx
          Reporter: yumpusamongus+kernelbugzilla@xxxxxxxxx
        Regression: No

Multiple cases of userspace resource monitors getting tripped up by this:

https://github.com/Syllo/nvtop/issues/139  

https://github.com/Syllo/nvtop/issues/208  

https://github.com/aristocratos/btop/issues/793  

https://gitlab.com/mission-center-devs/mission-center/-/issues/309

The behavior is highly unusual and would require special treatment of just that
file in userspace.

The docs say "The amdgpu driver provides a sysfs API for estimating how much
data has been received and sent by the GPU in the last second through PCIe".
Specifically, the LAST second, not the second starting when read() was called.

The culprit, as far as I can tell, is the msleep here:
https://elixir.bootlin.com/linux/v6.12.4/source/drivers/gpu/drm/amd/amdgpu/soc15.c#L756
(the same code is copy-pasted in 4 places).

I am not familiar with the intricacies of AMD GPUs, but what would be the cost
to having those counters enabled all the time, and reporting the number of
messages in some recent second? Or even better, ripping this out and exposing
the integrating message counts directly, so userspace can choose whichever
sample rate it wants?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.