Re: [PATCH] drm/amdgpu: fix an underflow on non-4KB-page systems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 30.03.21 um 14:04 schrieb Xi Ruoyao:
On 2021-03-30 03:40 +0800, Xi Ruoyao wrote:
On 2021-03-29 21:36 +0200, Christian König wrote:
Am 29.03.21 um 21:27 schrieb Xi Ruoyao:
Hi Christian,

I don't think there is any constraint implemented to ensure `num_entries %
AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0`.  For example, in `amdgpu_vm_bo_map()`:

          /* validate the parameters */
          if (saddr & AMDGPU_GPU_PAGE_MASK || offset & AMDGPU_GPU_PAGE_MASK
              size == 0 || size & AMDGPU_GPU_PAGE_MASK)
                  return -EINVAL;

/* snip */

          saddr /= AMDGPU_GPU_PAGE_SIZE;
          eaddr /= AMDGPU_GPU_PAGE_SIZE;

/* snip */

          mapping->start = saddr;
          mapping->last = eaddr;


If we really want to ensure (mapping->last - mapping->start + 1) %
AMDGPU_GPU_PAGES_IN_CPU_PAGE == 0, then we should replace
"AMDGPU_GPU_PAGE_MASK"
in "validate the parameters" with "PAGE_MASK".
It should be "~PAGE_MASK", "PAGE_MASK" has an opposite convention of
"AMDGPU_GPU_PAGE_MASK" :(.

Yeah, good point.

I tried it and it broke userspace: Xorg startup fails with EINVAL with
this
change.
Well in theory it is possible that we always fill the GPUVM on a 4k
basis while the native page size of the CPU is larger. Let me double
check the code.
On my platform, there are two issues both causing the VM corruption.  One is
fixed by agd5f/linux@fe001e7.

What is fe001e7? A commit id? If yes then that is to short and I can't find it.

   Another is in Mesa from userspace:  it uses
`dev_info->gart_page_size` as the alignment, but the kernel AMDGPU driver
expects it to use `dev_info->virtual_address_alignment`.

Mhm, looking at the kernel code I would rather say Mesa is correct and the kernel driver is broken.

The gart_page_size is limited by the CPU page size, but the virtual_address_alignment isn't.

If we can make the change to fill the GPUVM on a 4k basis, we can fix this issue
and make virtual_address_alignment = 4K.  Otherwise, we should fortify the
parameter validation, changing "AMDGPU_GPU_PAGE_MASK" to "~PAGE_MASK".  Then the
userspace will just get an EINVAL, instead of a slient VM corruption.  And
someone should tell Mesa developers to fix the code in this case.

I rather see this as a kernel bug. Can you test if this code fragment fixes your issue:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 64beb3399604..e1260b517e1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -780,7 +780,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
                }
                dev_info->virtual_address_alignment = max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE);                 dev_info->pte_fragment_size = (1 << adev->vm_manager.fragment_size) * AMDGPU_GPU_PAGE_SIZE;
-               dev_info->gart_page_size = AMDGPU_GPU_PAGE_SIZE;
+               dev_info->gart_page_size = dev_info->virtual_address_alignment;
                dev_info->cu_active_number = adev->gfx.cu_info.number;
                dev_info->cu_ao_mask = adev->gfx.cu_info.ao_cu_mask;
                dev_info->ce_ram_size = adev->gfx.ce_ram_size;


Thanks,
Christian.

--
Xi Ruoyao <xry111@xxxxxxxxxxxxxxxx>
School of Aerospace Science and Technology, Xidian University


_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux