Hi Felix, Thanks for your illustration with patience. I got your meaning then, and thanks to fix that. > I think you're right, there are some extra high bits in saddr and eaddr, but > they get discarded by the modulo-division in the next recursion level. For > added clarity the modulo division (or masking of high bits) could be moved up > to the caller. It may be more clear if modulo-division called before the caller. Anyway, Reviewed-by: Junwei Zhang <Jerry.Zhang at amd.com> On 03/29/2017 11:24 AM, Kuehling, Felix wrote: > Hi Jerry, > > Let me clarify my understanding of the problem and the intention of the fix. > > Address range per page: 4KB > Address range per PT (level 3): 2MB > Address range per PD (level 2): 1GB > Address range per PD (level 1): 512GB > Address range per PD (level 0): 256TB > > Imagine for example a mapping that spans multiple level 2 page directories. Say > from 0.5GB to 2.5GB. > > At level 0: > from = 0 > to = 0 > > At level 1: > from = 0 > to = 2 > > So for level 2 amdgpu_vm_alloc_levels gets called 3 times (pt_idx = [0..2]). > Without my change, it gets called with the same saddr and eaddr 3 times. So it > calculates the same "from" and "to" each time: > from = 256 % 512 = 256 > to = 767 % 512 = 255 > > Obviously that doesn't work. What we really want is this (from..to in level 2 > when called for pt_idx values from level 1): > > For pt_idx=0 we want to fill the second half of the level 2 page directory with > page tables: > from = 256 > to = 511 > > For pt_idx=1 we need to fill the entire level 2 page directories with page tables: > from = 0 > to = 511 > > For pt_idx=2 we want to fill the first half of the level 2 page directory with > page tables: > from = 0 > to = 255 > > This creates a contiguous range of page tables across the three level 2 page > directories that are spanned by the allocation. My change achieves that. > > I think you're right, there are some extra high bits in saddr and eaddr, but > they get discarded by the modulo-division in the next recursion level. For > added clarity the modulo division (or masking of high bits) could be moved up > to the caller. > > Regards, > Felix > > > -- > F e l i x K u e h l i n g > SMTS Software Development Engineer | Vertical Workstation/Compute > 1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada > (O) +1(289)695-1597 > _ _ _ _____ _____ > / \ | \ / | | _ \ \ _ | > / A \ | \M/ | | |D) ) /|_| | > /_/ \_\ |_| |_| |_____/ |__/ \| facebook.com/AMD | amd.com > ------------------------------------------------------------------------------- > *From:* Zhang, Jerry > *Sent:* Tuesday, March 28, 2017 10:54:03 PM > *To:* Kuehling, Felix; amd-gfx at lists.freedesktop.org; Koenig, Christian; Zhou, > David(ChunMing) > *Cc:* Deucher, Alexander; Russell, Kent > *Subject:* Re: [PATCH 3/3] drm/amdgpu: Fix multi-level page table bugs for > large BOs > On 03/29/2017 09:00 AM, Felix Kuehling wrote: >> Fix the start/end address calculation for address ranges that span >> multiple page directories in amdgpu_vm_alloc_levels. >> >> Add WARN_ONs if page tables aren't found. Otherwise the page table >> update would just fail silently. >> >> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++++++--- >> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> index 818747f..44aa039 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> @@ -278,6 +278,8 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev, >> from = (saddr >> shift) % amdgpu_vm_num_entries(adev, level); >> to = (eaddr >> shift) % amdgpu_vm_num_entries(adev, level); >> >> + WARN_ON(from > to); >> + >> if (to > parent->last_entry_used) >> parent->last_entry_used = to; >> >> @@ -312,8 +314,11 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev, >> } >> >> if (level < adev->vm_manager.num_level) { >> - r = amdgpu_vm_alloc_levels(adev, vm, entry, saddr, >> - eaddr, level); >> + uint64_t sub_saddr = (pt_idx == from) ? saddr : 0; >> + uint64_t sub_eaddr = (pt_idx == to) ? eaddr : >> + ((1 << shift) - 1); >> + r = amdgpu_vm_alloc_levels(adev, vm, entry, sub_saddr, >> + sub_eaddr, level); > > Do you mean to create a full sub-pt if pt_idx != from and != to? > I didn't catch it up clear. > > In my perspective, we may need to clear the saddr and eaddr upper level bit > before going on to allocate the next level PT. > Otherwise, the number of next PT entry will be incorrect, more than num_entries > as the upper level PT number calculated in. > > e.g. > there is a address contains > PD + PT1 + PT2 + PT3 > > for the first round, we could get correct "from" and "to" address with right > "shift"(3 blocks), then walk over the address space in PD range. > > Then for the 2nd round, we cannot get the expected "from" and "to" address with > "shift"(2 blocks), as it walks over the address space in PD + PT1 range. > While we'd like it's in range PT1 indeed. > > The fault way goes on with later round. > > Perhaps, that's the root cause about the issue you face. > > Jerry. > >> if (r) >> return r; >> } >> @@ -957,9 +962,11 @@ static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p, >> entry = &entry->entries[idx]; >> } >> >> - if (level) >> + if (WARN_ON(level)) >> return NULL; >> >> + WARN_ON(!entry->bo); >> + >> return entry->bo; >> } >> >>