[PATCH 3/3] drm/amdgpu: Fix multi-level page table bugs for large BOs

Felix.Kuehling@xxxxxxx (Kuehling, Felix) · Wed, 29 Mar 2017 03:24:50 +0000

Hi Jerry,

Let me clarify my understanding of the problem and the intention of the fix.

Address range per page: 4KB
Address range per PT (level 3): 2MB
Address range per PD (level 2): 1GB
Address range per PD (level 1): 512GB
Address range per PD (level 0): 256TB

Imagine for example a mapping that spans multiple level 2 page directories. Say from 0.5GB to 2.5GB.

At level 0:
from = 0
to = 0

At level 1:
from = 0
to = 2

So for level 2 amdgpu_vm_alloc_levels gets called 3 times (pt_idx = [0..2]). Without my change, it gets called with the same saddr and eaddr 3 times. So it calculates the same "from" and "to" each time:
from = 256 % 512 = 256
to = 767 % 512 = 255

Obviously that doesn't work. What we really want is this (from..to in level 2 when called for pt_idx values from level 1):

For pt_idx=0 we want to fill the second half of the level 2 page directory with page tables:
from = 256
to = 511

For pt_idx=1 we need to fill the entire level 2 page directories with page tables:
from = 0
to = 511

For pt_idx=2 we want to fill the first half of the level 2 page directory with page tables:
from = 0
to = 255

This creates a contiguous range of page tables across the three level 2 page directories that are spanned by the allocation. My change achieves that.

I think you're right, there are some extra high bits in saddr and eaddr, but they get discarded by the modulo-division in the next recursion level. For added clarity the modulo division (or masking of high bits) could be moved up to the caller.

Regards,
  Felix

--
F e l i x   K u e h l i n g
SMTS Software Development Engineer | Vertical Workstation/Compute
1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
(O) +1(289)695-1597
   _     _   _   _____   _____
  / \   | \ / | |  _  \  \ _  |
 / A \  | \M/ | | |D) )  /|_| |
/_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com
________________________________
From: Zhang, Jerry
Sent: Tuesday, March 28, 2017 10:54:03 PM
To: Kuehling, Felix; amd-gfx at lists.freedesktop.org; Koenig, Christian; Zhou, David(ChunMing)
Cc: Deucher, Alexander; Russell, Kent
Subject: Re: [PATCH 3/3] drm/amdgpu: Fix multi-level page table bugs for large BOs

On 03/29/2017 09:00 AM, Felix Kuehling wrote:
> Fix the start/end address calculation for address ranges that span
> multiple page directories in amdgpu_vm_alloc_levels.
>
> Add WARN_ONs if page tables aren't found. Otherwise the page table
> update would just fail silently.
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 818747f..44aa039 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -278,6 +278,8 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev,
>        from = (saddr >> shift) % amdgpu_vm_num_entries(adev, level);
>        to = (eaddr >> shift) % amdgpu_vm_num_entries(adev, level);
>
> +     WARN_ON(from > to);
> +
>        if (to > parent->last_entry_used)
>                parent->last_entry_used = to;
>
> @@ -312,8 +314,11 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev,
>                }
>
>                if (level < adev->vm_manager.num_level) {
> -                     r = amdgpu_vm_alloc_levels(adev, vm, entry, saddr,
> -                                                eaddr, level);
> +                     uint64_t sub_saddr = (pt_idx == from) ? saddr : 0;
> +                     uint64_t sub_eaddr = (pt_idx == to) ? eaddr :
> +                             ((1 << shift) - 1);
> +                     r = amdgpu_vm_alloc_levels(adev, vm, entry, sub_saddr,
> +                                                sub_eaddr, level);

Do you mean to create a full sub-pt if pt_idx != from and != to?
I didn't catch it up clear.

In my perspective, we may need to clear the saddr and eaddr upper level bit
before going on to allocate the next level PT.
Otherwise, the number of next PT entry will be incorrect, more than num_entries
as the upper level PT number calculated in.

e.g.
there is a address contains
PD + PT1 + PT2 + PT3

for the first round, we could get correct "from" and "to" address with right
"shift"(3 blocks), then walk over the address space in PD range.

Then for the 2nd round, we cannot get the expected "from" and "to" address with
"shift"(2 blocks), as it walks over the address space in PD + PT1 range.
While we'd like it's in range PT1 indeed.

The fault way goes on with later round.

Perhaps, that's the root cause about the issue you face.

Jerry.

>                        if (r)
>                                return r;
>                }
> @@ -957,9 +962,11 @@ static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p,
>                entry = &entry->entries[idx];
>        }
>
> -     if (level)
> +     if (WARN_ON(level))
>                return NULL;
>
> +     WARN_ON(!entry->bo);
> +
>        return entry->bo;
>   }
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170329/1fa2b02f/attachment.html>