On 03/29/2017 02:52 PM, Christian König wrote: > Am 29.03.2017 um 07:58 schrieb Zhang, Jerry (Junwei): >> Hi Felix, >> >> Thanks for your illustration with patience. >> I got your meaning then, and thanks to fix that. >> >> > I think you're right, there are some extra high bits in saddr and eaddr, but >> > they get discarded by the modulo-division in the next recursion level. For >> > added clarity the modulo division (or masking of high bits) could be moved up >> > to the caller. >> >> It may be more clear if modulo-division called before the caller. > > Yeah, indeed that isn't easy to get on first glance. Understanding :) > > Please clean that up and also remove all those WARN_ON(), we don't want to spam > the system log with backtraces like this. > > Regards, > Christian. > >> Anyway, >> Reviewed-by: Junwei Zhang <Jerry.Zhang at amd.com> >> >> >> On 03/29/2017 11:24 AM, Kuehling, Felix wrote: >>> Hi Jerry, >>> >>> Let me clarify my understanding of the problem and the intention of the fix. >>> >>> Address range per page: 4KB >>> Address range per PT (level 3): 2MB >>> Address range per PD (level 2): 1GB >>> Address range per PD (level 1): 512GB >>> Address range per PD (level 0): 256TB >>> >>> Imagine for example a mapping that spans multiple level 2 page directories. Say >>> from 0.5GB to 2.5GB. >>> >>> At level 0: >>> from = 0 >>> to = 0 >>> >>> At level 1: >>> from = 0 >>> to = 2 >>> >>> So for level 2 amdgpu_vm_alloc_levels gets called 3 times (pt_idx = [0..2]). >>> Without my change, it gets called with the same saddr and eaddr 3 times. So it >>> calculates the same "from" and "to" each time: >>> from = 256 % 512 = 256 >>> to = 767 % 512 = 255 >>> >>> Obviously that doesn't work. What we really want is this (from..to in level 2 >>> when called for pt_idx values from level 1): >>> >>> For pt_idx=0 we want to fill the second half of the level 2 page directory with >>> page tables: >>> from = 256 >>> to = 511 >>> >>> For pt_idx=1 we need to fill the entire level 2 page directories with page >>> tables: >>> from = 0 >>> to = 511 >>> >>> For pt_idx=2 we want to fill the first half of the level 2 page directory with >>> page tables: >>> from = 0 >>> to = 255 >>> >>> This creates a contiguous range of page tables across the three level 2 page >>> directories that are spanned by the allocation. My change achieves that. >>> >>> I think you're right, there are some extra high bits in saddr and eaddr, but >>> they get discarded by the modulo-division in the next recursion level. For >>> added clarity the modulo division (or masking of high bits) could be moved up >>> to the caller. >>> >>> Regards, >>> Felix >>> >>> >>> -- >>> F e l i x K u e h l i n g >>> SMTS Software Development Engineer | Vertical Workstation/Compute >>> 1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada >>> (O) +1(289)695-1597 >>> _ _ _ _____ _____ >>> / \ | \ / | | _ \ \ _ | >>> / A \ | \M/ | | |D) ) /|_| | >>> /_/ \_\ |_| |_| |_____/ |__/ \| facebook.com/AMD | amd.com >>> ------------------------------------------------------------------------------- >>> *From:* Zhang, Jerry >>> *Sent:* Tuesday, March 28, 2017 10:54:03 PM >>> *To:* Kuehling, Felix; amd-gfx at lists.freedesktop.org; Koenig, Christian; Zhou, >>> David(ChunMing) >>> *Cc:* Deucher, Alexander; Russell, Kent >>> *Subject:* Re: [PATCH 3/3] drm/amdgpu: Fix multi-level page table bugs for >>> large BOs >>> On 03/29/2017 09:00 AM, Felix Kuehling wrote: >>>> Fix the start/end address calculation for address ranges that span >>>> multiple page directories in amdgpu_vm_alloc_levels. >>>> >>>> Add WARN_ONs if page tables aren't found. Otherwise the page table >>>> update would just fail silently. >>>> >>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 ++++++++++--- >>>> 1 file changed, 10 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> index 818747f..44aa039 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> @@ -278,6 +278,8 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device >>>> *adev, >>>> from = (saddr >> shift) % amdgpu_vm_num_entries(adev, level); >>>> to = (eaddr >> shift) % amdgpu_vm_num_entries(adev, level); >>>> >>>> + WARN_ON(from > to); >>>> + >>>> if (to > parent->last_entry_used) >>>> parent->last_entry_used = to; >>>> >>>> @@ -312,8 +314,11 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device >>>> *adev, >>>> } >>>> >>>> if (level < adev->vm_manager.num_level) { >>>> - r = amdgpu_vm_alloc_levels(adev, vm, entry, saddr, >>>> - eaddr, level); >>>> + uint64_t sub_saddr = (pt_idx == from) ? saddr : 0; >>>> + uint64_t sub_eaddr = (pt_idx == to) ? eaddr : >>>> + ((1 << shift) - 1); >>>> + r = amdgpu_vm_alloc_levels(adev, vm, entry, sub_saddr, >>>> + sub_eaddr, level); >>> >>> Do you mean to create a full sub-pt if pt_idx != from and != to? >>> I didn't catch it up clear. >>> >>> In my perspective, we may need to clear the saddr and eaddr upper level bit >>> before going on to allocate the next level PT. >>> Otherwise, the number of next PT entry will be incorrect, more than num_entries >>> as the upper level PT number calculated in. >>> >>> e.g. >>> there is a address contains >>> PD + PT1 + PT2 + PT3 >>> >>> for the first round, we could get correct "from" and "to" address with right >>> "shift"(3 blocks), then walk over the address space in PD range. >>> >>> Then for the 2nd round, we cannot get the expected "from" and "to" address with >>> "shift"(2 blocks), as it walks over the address space in PD + PT1 range. >>> While we'd like it's in range PT1 indeed. >>> >>> The fault way goes on with later round. >>> >>> Perhaps, that's the root cause about the issue you face. >>> >>> Jerry. >>> >>>> if (r) >>>> return r; >>>> } >>>> @@ -957,9 +962,11 @@ static struct amdgpu_bo *amdgpu_vm_get_pt(struct >>>> amdgpu_pte_update_params *p, >>>> entry = &entry->entries[idx]; >>>> } >>>> >>>> - if (level) >>>> + if (WARN_ON(level)) >>>> return NULL; >>>> >>>> + WARN_ON(!entry->bo); >>>> + >>>> return entry->bo; >>>> } >>>> >>>> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > >