Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 03.07.20 um 08:05 schrieb Felix Kuehling:
Am 2020-07-01 um 10:34 a.m. schrieb Li, Dennis:
[AMD Official Use Only - Internal Distribution Only]

Hi, Christian and Alex
       Not only amdgpu ioctls, but amdkfd ioctls also have the same issue.
Most KFD ioctls don't access HW directly. The only place that interacts
with HW in KFD is the device queues manager (DQM) and beneath it the
packet manager. In DQM we already have protections to avoid HW access
while a reset is in progress.

For other HW access, KFD goes through helper functions in amdgpu.

Memory management ioctls indirectly access HW for page table updates.
However, that requires validating the page table BOs first. Are VRAM BOs
considered "valid" during a GPU reset? When using SDMA for page table
updates, the DRM GPU scheduler is also involved. Is that suspended
during a GPU reset?

That stuff should work concurrently. The scheduler is stopped during a reset, but we can still push new jobs to the queues.

Stuff like TLB flushes are also harmless since after a reset we can safely assume that the TLB is completely empty.

The only other KFD ioctl that looks like it might access HW during a GPU
reset is kfd_ioctl_get_clock_counters by calling
amdgpu_amdkfd_get_gpu_clock_counter.

Yeah, that is indeed a problem which needs handling.

Christian.


Regards,
   Felix



Best Regards
Dennis Li
-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Christian König
Sent: Wednesday, July 1, 2020 4:20 PM
To: Alex Deucher <alexdeucher@xxxxxxxxx>; amd-gfx list <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: return an error for hw access in INFO ioctl when in reset

I don't think this is a good idea, we should probably rather wait for the GPU reset to finish by taking the appropriate lock.

Christian.

Am 01.07.20 um 07:33 schrieb Alex Deucher:
ping?

On Fri, Jun 26, 2020 at 10:04 AM Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
When the GPU is in reset, accessing the hw is unreliable and could
interfere with the reset.  Return an error in those cases.

Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 ++++++
   1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 341d072edd95..fd51d6554ee2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -684,6 +684,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
                  if (info->read_mmr_reg.count > 128)
                          return -EINVAL;

+               if (adev->in_gpu_reset)
+                       return -EPERM;
+
                  regs = kmalloc_array(info->read_mmr_reg.count, sizeof(*regs), GFP_KERNEL);
                  if (!regs)
                          return -ENOMEM; @@ -854,6 +857,9 @@ static
int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file
                  if (!adev->pm.dpm_enabled)
                          return -ENOENT;

+               if (adev->in_gpu_reset)
+                       return -EPERM;
+
                  switch (info->sensor_info.type) {
                  case AMDGPU_INFO_SENSOR_GFX_SCLK:
                          /* get sclk in Mhz */
--
2.25.4

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://list
s.freedesktop.org/mailman/listinfo/amd-gfx
nnis.Li%40amd.com%7Cefeeda4b6d194660fbc508d81d9791a3%7C3dd8961fe4884e6
08e11a82d994e183d%7C0%7C0%7C637291884123360340&amp;sdata=GNPWQNndUJKx7
70fDTuRGBnJzfmRUQjD4B1HBie3xUQ%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux