On 2018-09-12 02:56 AM, Christian König wrote: > Am 12.09.2018 um 00:00 schrieb Felix Kuehling: >> On 2018-09-11 03:19 AM, Christian König wrote: >>> Hi Felix, >>> >>> let me try to explain the problem on an example: >>> >>> 1. We have a running job which needs recoverable page faults for >>> accessing the process address space. >>> 2. We run into memory pressure on VRAM and start to evict things. >>> 3. A page tables of the running job is picked up for eviction. >>> 4. We schedule a copy command to move the content of the page table to >>> system memory. >> I think we should change this. If you evict a page table, you don't need >> to preserve its contents. You should be able to restore the page table >> contents from scratch when you handle the page fault that restores it. > > Yeah, already implemented. You actually don't need the page fault for > that. > >>> 5. The new system memory location of the page table is noted in its BO. >> You mean in the parent page table? You can just invalidate the entry in >> the parent page table and let it fault. > > I'm repeating myself, but exactly that is what won't work. > > See we still have engines which can't handle page faults which uses > the same VM at the same time. This means that we can't just fault in > page tables. And I don't understand why that is a problem. Those clients rely on fences to keep their BOs resident, including the page tables. Are you planning to change that? Regards,  Felix > > What we could do is to separate the address space into a low and a > high range where we have different handling for both. > > I've already prototyped this and prepared mostly everything necessary > for that, but that is still not 100% completed. > > Regards, > Christian. > >>> 6. We get a fault from the job and swap in the page from the process >>> address space. >> No. Then you need to keep updating page tables that are swapped out. >> Instead just discard them and rebuild on fault. >> >>> 7. Now we need to enter the new page address into the page table -> >>> *BAM* >>> >>> The problem is now that we don't know the address of the page table >>> because the current location was replaced with the future location in >>> step #5. >> You solve that problem by discarding page tables that are swapped out. >> Then there is no future location. >> >>> We could now argue that we could update the page tables on the fly for >>> the evicted page and never wait for the current job to finish, but >>> this won't work because not all engines can handle that. >> Engines that can't handle page faults will depend on fences to keep >> their page tables resident, like we do today. So they won't have this >> problem. >> >> Regards, >>   Felix >> >>> I will circumvent this problem for now by blocking the eviction before >>> step #5. The issue with that is that this pipelining is responsible >>> for nearly 20% of the fps in some testcases. >>> >>> I hope that it won't be that bad when I limit disabling the pipelining >>> to only page tables, but it is certainly possible that we will get >>> pushback on this. >>> >>> If that happens we probably need to add some flags to limit this >>> workaround even more to only the root PD and all the PDs/PTs which are >>> involved in recoverable faults. >>> >>> Regards, >>> Christian. >>> >>> Am 10.09.2018 um 21:10 schrieb Felix Kuehling: >>>> I'm not sure why you need to distinguish current and future state when >>>> dealing with page faults. When you get a page fault, you know that the >>>> GPU is trying to access memory right now, in the present. So you're >>>> always working with the current state. When the CPU page table >>>> changes, >>>> you get an MMU notifier that allows you to invalidate the >>>> corresponding >>>> GPU PTEs. If you have a valid GPU PTE, it always represents the >>>> current >>>> states and is in sync with the CPU page table. If the GPU page >>>> table is >>>> ever outdated, it should have an invalid entry in that place. >>>> >>>> If SDMA is used to update the GPU PTEs, there is a delay. The MMU >>>> notifier is synchronous, so it shouldn't be a problem. You just >>>> wait for >>>> the SDMA job to complete before returning. When updating PTEs with new >>>> valid addresses, it's asynchronous. But the GPU will continue retrying >>>> on the invalid entry until SDMA finishes. So it's also implicitly >>>> synchronized on the GPU side. >>>> >>>> Regards, >>>>    Felix >>>> >>>> >>>> On 2018-09-10 05:42 AM, Christian König wrote: >>>>> Hi Felix & Oak, >>>>> >>>>> over the weekend I had the idea that we could just use the shadow BOs >>>>> to have the current state in a page fault. They are GTT BOs and CPU >>>>> accessible anyway. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>> Am 08.09.2018 um 09:34 schrieb Christian König: >>>>>> Hi Felix, >>>>>> >>>>>>> But why do you want to update page tables when there is no more >>>>>>> user >>>>>>> mode context that cares about them? Is this just to allow pending >>>>>>> work >>>>>>> to complete after the user mode context is gone? To prevent hangs? >>>>>> The problem I'm see is that we still don't have a reliable way of >>>>>> killing a command submission while making sure that no more >>>>>> faults of >>>>>> it are in the flight. >>>>>> >>>>>> E.g. the HWS has a function to kill a running job on the hardware, >>>>>> but that only works for compute and there is no way of telling when >>>>>> that operation has finished. >>>>>> >>>>>> But Oak has already convinced me that we should probably work on >>>>>> that >>>>>> instead of trying to keep the VM alive as long as possible. >>>>>> >>>>>> Regards, >>>>>> Christian. >>>>>> >>>>>> Am 08.09.2018 um 02:27 schrieb Felix Kuehling: >>>>>>> Hi Christian, >>>>>>> >>>>>>> I'm not sure I get your point here. >>>>>>> >>>>>>> As I understand it, the amdgpu_vm structure represents the user >>>>>>> mode VM >>>>>>> context. It has the pointers to the root page directory and the >>>>>>> rest of >>>>>>> the page table hierarchy. If there is no amdgpu_vm, there is no >>>>>>> user >>>>>>> mode context that cares about the page tables. >>>>>>> >>>>>>> If we leave the page table pointers in the amdgpu_vm, then handling >>>>>>> faults after the VM is destroyed is pointless. We can't find the >>>>>>> page >>>>>>> tables and we can't update them, so there is nothing we can do in >>>>>>> response to VM faults. >>>>>>> >>>>>>> So I'm guessing that you want to move the page table pointers >>>>>>> into a >>>>>>> different structure that exists half-independently of the VM. It >>>>>>> would >>>>>>> be created when the VM is created (that's where we currently >>>>>>> allocate >>>>>>> the PASID) but can be destroyed slightly later. >>>>>>> >>>>>>> But why do you want to update page tables when there is no more >>>>>>> user >>>>>>> mode context that cares about them? Is this just to allow pending >>>>>>> work >>>>>>> to complete after the user mode context is gone? To prevent hangs? >>>>>>> >>>>>>> Regards, >>>>>>>     Felix >>>>>>> >>>>>>> On 2018-09-10 07:20 AM, Christian König wrote: >>>>>>>>> Am I right that the handling of page fault need a valid VM? >>>>>>>> No, exactly that view is incorrect. >>>>>>>> >>>>>>>> The VM is meant to be a memory management structure of page >>>>>>>> tables and >>>>>>>> is completely pointless fault processing because it represent the >>>>>>>> future state of the page tables and not the current one. >>>>>>>> >>>>>>>> What we need for fault processing is a new structure build around >>>>>>>> PASIDs which is feed by the with addresses when page tables are >>>>>>>> moved >>>>>>>> around. >>>>>>>> >>>>>>>> Alternatively I hope that we can use the SDMA to walk the page >>>>>>>> tables >>>>>>>> and update the required entries by just using the address. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Christian. >>>>>>>> >>>>>>>> Am 07.09.2018 um 16:55 schrieb Zeng, Oak: >>>>>>>>> Hi Christian, >>>>>>>>> >>>>>>>>> >>>>>>>>> What is the value of delay the destroy of hash to after vm is >>>>>>>>> destroyed? Since vm is destroyed, you basically donâ??t have enough >>>>>>>>> information to paging in the correct page to gpuvm. Am I right >>>>>>>>> that >>>>>>>>> the handling of page fault need a valid VM? For example, you >>>>>>>>> need the >>>>>>>>> VM to get the corresponding physical address of the faulty page. >>>>>>>>> >>>>>>>>> >>>>>>>>> After vm is destroyed, the retry interrupt can be directly >>>>>>>>> discard as >>>>>>>>> you canâ??t find vm through pasid. You can think this is also one >>>>>>>>> kind >>>>>>>>> of prescreen. >>>>>>>>> >>>>>>>>> >>>>>>>>> So I am stand for the point that, there is no need to delay the >>>>>>>>> destroy of hash to after vm is destroyed. Prescreening hash >>>>>>>>> can be >>>>>>>>> destroyed at the time of vm_fini. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Oak >>>>>>>>> >>>>>>>>> >>>>>>>>> *From:*Christian König <ckoenig.leichtzumerken at gmail.com> >>>>>>>>> *Sent:* Friday, September 07, 2018 4:39 AM >>>>>>>>> *To:* Zeng, Oak <Oak.Zeng at amd.com>; Oak Zeng >>>>>>>>> <zengshanjun at gmail.com>; >>>>>>>>> amd-gfx at lists.freedesktop.org >>>>>>>>> *Cc:* Zeng, Oak <Oak.Zeng at amd.com> >>>>>>>>> *Subject:* Re: [PATCH 1/2] drm/amdgpu: Moved fault hash table to >>>>>>>>> amdgpu vm >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Oak, >>>>>>>>> >>>>>>>>> yeah, but this is still a NAK. Making the hash per PASID was >>>>>>>>> not a >>>>>>>>> suggestion but rather a must have. >>>>>>>>> >>>>>>>>> The VM structures must be destroyed while the processing is still >>>>>>>>> ongoing, or otherwise we would not have a clean OOM handling. >>>>>>>>> >>>>>>>>> The IDR for PASID lockup can be put into amdgpu_ids.c, you can >>>>>>>>> just >>>>>>>>> replace the IDA already used there with an IDR. >>>>>>>>> >>>>>>>>> Since the PASID is already freed up delayed we should have the >>>>>>>>> grace >>>>>>>>> period for interrupt processing included. If that still isn't >>>>>>>>> sufficient we can still add some delayed work for this. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>> Am 07.09.2018 um 06:16 schrieb ozeng: >>>>>>>>> >>>>>>>>>       Hi Christian, >>>>>>>>> >>>>>>>>>       In this implementation, fault hash is made per vm, not per >>>>>>>>> pasid >>>>>>>>>       as suggested, based on below considerations: >>>>>>>>> >>>>>>>>>         * Delay the destroy of hash introduces more effort like >>>>>>>>> how to >>>>>>>>>           set the proper grace period after which no retry >>>>>>>>> interrupt >>>>>>>>>           will be generated. We want to avoid those complication >>>>>>>>> if we >>>>>>>>>           can. >>>>>>>>>         * The problem of the per vm implementation is that it >>>>>>>>> is hard >>>>>>>>>           to delay the destroy of fault hash, because when vm is >>>>>>>>>           destroyed, prescreen function won't be able to >>>>>>>>> retrieve the >>>>>>>>>           fault hash. But in this case, the prescreen >>>>>>>>> function can >>>>>>>>>           either let the interrupt go through (to gmc) or ignore >>>>>>>>> it. >>>>>>>>>           From this perspective, it is not very necessary to >>>>>>>>> delay the >>>>>>>>>           destroy of hash table. >>>>>>>>>         * On the other hand, since ASICs after vega10 have the >>>>>>>>> HW CAM >>>>>>>>>           feature. So the SW IV prescreen is only useful for >>>>>>>>> vega10. >>>>>>>>>           For all the HW implemented CAM, we can consider the >>>>>>>>> delayed >>>>>>>>>           CAM acknowledgment. >>>>>>>>>         * Making it per pasid need to introduce another idr to >>>>>>>>>           correlate pasid and hash table - where to put the idr? >>>>>>>>> You >>>>>>>>>           will have to make it a global variable which is not >>>>>>>>> very >>>>>>>>>           desirable. >>>>>>>>> >>>>>>>>>       The main purpose of this patch is to solve the >>>>>>>>> inter-process >>>>>>>>> lock >>>>>>>>>       issue (when hash table is full), while still keep codes >>>>>>>>> simple. >>>>>>>>> >>>>>>>>>       Also with this patch, the faults kfifo is not necessary >>>>>>>>> any >>>>>>>>> more. >>>>>>>>>       See patch 2. >>>>>>>>> >>>>>>>>>       Regards, >>>>>>>>> >>>>>>>>>       Oak >>>>>>>>> >>>>>>>>> >>>>>>>>>       On 09/06/2018 11:28 PM, Oak Zeng wrote: >>>>>>>>> >>>>>>>>>           In stead of share one fault hash table per device, >>>>>>>>> make it >>>>>>>>> >>>>>>>>>           per vm. This can avoid inter-process lock issue when >>>>>>>>> fault >>>>>>>>> >>>>>>>>>           hash table is full. >>>>>>>>> >>>>>>>>> >>>>>>>>>           Change-Id: I5d1281b7c41eddc8e26113e010516557588d3708 >>>>>>>>> >>>>>>>>>           Signed-off-by: Oak Zeng <Oak.Zeng at amd.com> >>>>>>>>> <mailto:Oak.Zeng at amd.com> >>>>>>>>> >>>>>>>>>           Suggested-by: Christian Konig >>>>>>>>> <Christian.Koenig at amd.com> >>>>>>>>> <mailto:Christian.Koenig at amd.com> >>>>>>>>> >>>>>>>>>           Suggested-by: Felix Kuehling <Felix.Kuehling at amd.com> >>>>>>>>> <mailto:Felix.Kuehling at amd.com> >>>>>>>>> >>>>>>>>>           --- >>>>>>>>> >>>>>>>>>            drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 75 >>>>>>>>> ------------------------ >>>>>>>>> >>>>>>>>>            drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 10 ---- >>>>>>>>> >>>>>>>>>            drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 102 >>>>>>>>> ++++++++++++++++++++++++++++++++- >>>>>>>>> >>>>>>>>>            drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 12 ++++ >>>>>>>>> >>>>>>>>>            drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 38 >>>>>>>>> +++++------- >>>>>>>>> >>>>>>>>>            5 files changed, 127 insertions(+), 110 deletions(-) >>>>>>>>> >>>>>>>>> >>>>>>>>>           diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c >>>>>>>>> >>>>>>>>>           index 06373d4..4ed8621 100644 >>>>>>>>> >>>>>>>>>           --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c >>>>>>>>> >>>>>>>>>           +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c >>>>>>>>> >>>>>>>>>           @@ -197,78 +197,3 @@ int amdgpu_ih_process(struct >>>>>>>>> amdgpu_device *adev) >>>>>>>>> >>>>>>>>>             return IRQ_HANDLED; >>>>>>>>> >>>>>>>>>            } >>>>>>>>> >>>>>>>>> >>>>>>>>>           -/** >>>>>>>>> >>>>>>>>>           - * amdgpu_ih_add_fault - Add a page fault record >>>>>>>>> >>>>>>>>>           - * >>>>>>>>> >>>>>>>>>           - * @adev: amdgpu device pointer >>>>>>>>> >>>>>>>>>           - * @key: 64-bit encoding of PASID and address >>>>>>>>> >>>>>>>>>           - * >>>>>>>>> >>>>>>>>>           - * This should be called when a retry page fault >>>>>>>>> interrupt is >>>>>>>>> >>>>>>>>>           - * received. If this is a new page fault, it will be >>>>>>>>> added to a hash >>>>>>>>> >>>>>>>>>           - * table. The return value indicates whether this >>>>>>>>> is a >>>>>>>>> new fault, or >>>>>>>>> >>>>>>>>>           - * a fault that was already known and is already >>>>>>>>> being >>>>>>>>> handled. >>>>>>>>> >>>>>>>>>           - * >>>>>>>>> >>>>>>>>>           - * If there are too many pending page faults, this >>>>>>>>> will >>>>>>>>> fail. Retry >>>>>>>>> >>>>>>>>>           - * interrupts should be ignored in this case until >>>>>>>>> there >>>>>>>>> is enough >>>>>>>>> >>>>>>>>>           - * free space. >>>>>>>>> >>>>>>>>>           - * >>>>>>>>> >>>>>>>>>           - * Returns 0 if the fault was added, 1 if the >>>>>>>>> fault was >>>>>>>>> already known, >>>>>>>>> >>>>>>>>>           - * -ENOSPC if there are too many pending faults. >>>>>>>>> >>>>>>>>>           - */ >>>>>>>>> >>>>>>>>>           -int amdgpu_ih_add_fault(struct amdgpu_device >>>>>>>>> *adev, u64 >>>>>>>>> key) >>>>>>>>> >>>>>>>>>           -{ >>>>>>>>> >>>>>>>>>           - unsigned long flags; >>>>>>>>> >>>>>>>>>           - int r = -ENOSPC; >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           - if (WARN_ON_ONCE(!adev->irq.ih.faults)) >>>>>>>>> >>>>>>>>>           -         /* Should be allocated in >>>>>>>>> <IP>_ih_sw_init on >>>>>>>>> GPUs that >>>>>>>>> >>>>>>>>>           -          * support retry faults and require retry >>>>>>>>> filtering. >>>>>>>>> >>>>>>>>>           -          */ >>>>>>>>> >>>>>>>>>           -         return r; >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           - spin_lock_irqsave(&adev->irq.ih.faults->lock, >>>>>>>>> flags); >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           - /* Only let the hash table fill up to 50% for best >>>>>>>>> performance */ >>>>>>>>> >>>>>>>>>           - if (adev->irq.ih.faults->count >= (1 << >>>>>>>>> (AMDGPU_PAGEFAULT_HASH_BITS-1))) >>>>>>>>> >>>>>>>>>           -         goto unlock_out; >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           - r = chash_table_copy_in(&adev->irq.ih.faults->hash, >>>>>>>>> key, NULL); >>>>>>>>> >>>>>>>>>           - if (!r) >>>>>>>>> >>>>>>>>>           -         adev->irq.ih.faults->count++; >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           - /* chash_table_copy_in should never fail unless >>>>>>>>> we're >>>>>>>>> losing count */ >>>>>>>>> >>>>>>>>>           - WARN_ON_ONCE(r < 0); >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           -unlock_out: >>>>>>>>> >>>>>>>>>           - spin_unlock_irqrestore(&adev->irq.ih.faults->lock, >>>>>>>>> flags); >>>>>>>>> >>>>>>>>>           - return r; >>>>>>>>> >>>>>>>>>           -} >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           -/** >>>>>>>>> >>>>>>>>>           - * amdgpu_ih_clear_fault - Remove a page fault record >>>>>>>>> >>>>>>>>>           - * >>>>>>>>> >>>>>>>>>           - * @adev: amdgpu device pointer >>>>>>>>> >>>>>>>>>           - * @key: 64-bit encoding of PASID and address >>>>>>>>> >>>>>>>>>           - * >>>>>>>>> >>>>>>>>>           - * This should be called when a page fault has been >>>>>>>>> handled. Any >>>>>>>>> >>>>>>>>>           - * future interrupt with this key will be >>>>>>>>> processed as a >>>>>>>>> new >>>>>>>>> >>>>>>>>>           - * page fault. >>>>>>>>> >>>>>>>>>           - */ >>>>>>>>> >>>>>>>>>           -void amdgpu_ih_clear_fault(struct amdgpu_device >>>>>>>>> *adev, >>>>>>>>> u64 key) >>>>>>>>> >>>>>>>>>           -{ >>>>>>>>> >>>>>>>>>           - unsigned long flags; >>>>>>>>> >>>>>>>>>           - int r; >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           - if (!adev->irq.ih.faults) >>>>>>>>> >>>>>>>>>           -         return; >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           - spin_lock_irqsave(&adev->irq.ih.faults->lock, >>>>>>>>> flags); >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           - r = chash_table_remove(&adev->irq.ih.faults->hash, >>>>>>>>> key, NULL); >>>>>>>>> >>>>>>>>>           - if (!WARN_ON_ONCE(r < 0)) { >>>>>>>>> >>>>>>>>>           -         adev->irq.ih.faults->count--; >>>>>>>>> >>>>>>>>>           - WARN_ON_ONCE(adev->irq.ih.faults->count < 0); >>>>>>>>> >>>>>>>>>           - } >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           - spin_unlock_irqrestore(&adev->irq.ih.faults->lock, >>>>>>>>> flags); >>>>>>>>> >>>>>>>>>           -} >>>>>>>>> >>>>>>>>>           diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h >>>>>>>>> >>>>>>>>>           index a23e1c0..f411ffb 100644 >>>>>>>>> >>>>>>>>>           --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h >>>>>>>>> >>>>>>>>>           +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h >>>>>>>>> >>>>>>>>>           @@ -32,13 +32,6 @@ struct amdgpu_device; >>>>>>>>> >>>>>>>>>            #define AMDGPU_IH_CLIENTID_LEGACY 0 >>>>>>>>> >>>>>>>>>            #define AMDGPU_IH_CLIENTID_MAX SOC15_IH_CLIENTID_MAX >>>>>>>>> >>>>>>>>> >>>>>>>>>           -#define AMDGPU_PAGEFAULT_HASH_BITS 8 >>>>>>>>> >>>>>>>>>           -struct amdgpu_retryfault_hashtable { >>>>>>>>> >>>>>>>>>           - DECLARE_CHASH_TABLE(hash, >>>>>>>>> AMDGPU_PAGEFAULT_HASH_BITS, >>>>>>>>> 8, 0); >>>>>>>>> >>>>>>>>>           - spinlock_t    lock; >>>>>>>>> >>>>>>>>>           - int           count; >>>>>>>>> >>>>>>>>>           -}; >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>            /* >>>>>>>>> >>>>>>>>>            * R6xx+ IH ring >>>>>>>>> >>>>>>>>>            */ >>>>>>>>> >>>>>>>>>           @@ -57,7 +50,6 @@ struct amdgpu_ih_ring { >>>>>>>>> >>>>>>>>>             bool                  use_doorbell; >>>>>>>>> >>>>>>>>>             bool                  use_bus_addr; >>>>>>>>> >>>>>>>>>             dma_addr_t            rb_dma_addr; /* only used >>>>>>>>> when >>>>>>>>> use_bus_addr = true */ >>>>>>>>> >>>>>>>>>           - struct amdgpu_retryfault_hashtable *faults; >>>>>>>>> >>>>>>>>>            }; >>>>>>>>> >>>>>>>>> >>>>>>>>>            #define AMDGPU_IH_SRC_DATA_MAX_SIZE_DW 4 >>>>>>>>> >>>>>>>>>           @@ -95,7 +87,5 @@ int amdgpu_ih_ring_init(struct >>>>>>>>> amdgpu_device *adev, unsigned ring_size, >>>>>>>>> >>>>>>>>>                             bool use_bus_addr); >>>>>>>>> >>>>>>>>>            void amdgpu_ih_ring_fini(struct amdgpu_device *adev); >>>>>>>>> >>>>>>>>>            int amdgpu_ih_process(struct amdgpu_device *adev); >>>>>>>>> >>>>>>>>>           -int amdgpu_ih_add_fault(struct amdgpu_device >>>>>>>>> *adev, u64 >>>>>>>>> key); >>>>>>>>> >>>>>>>>>           -void amdgpu_ih_clear_fault(struct amdgpu_device >>>>>>>>> *adev, >>>>>>>>> u64 key); >>>>>>>>> >>>>>>>>> >>>>>>>>>            #endif >>>>>>>>> >>>>>>>>>           diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>> >>>>>>>>>           index 1d7e3c1..8b220e9 100644 >>>>>>>>> >>>>>>>>>           --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>> >>>>>>>>>           +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>>>>>> >>>>>>>>>           @@ -2692,6 +2692,22 @@ void >>>>>>>>> amdgpu_vm_adjust_size(struct >>>>>>>>> amdgpu_device *adev, uint32_t min_vm_size, >>>>>>>>> >>>>>>>>>                      adev->vm_manager.fragment_size); >>>>>>>>> >>>>>>>>>            } >>>>>>>>> >>>>>>>>> >>>>>>>>>           +static struct amdgpu_retryfault_hashtable >>>>>>>>> *init_fault_hash(void) >>>>>>>>> >>>>>>>>>           +{ >>>>>>>>> >>>>>>>>>           + struct amdgpu_retryfault_hashtable *fault_hash; >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + fault_hash = kmalloc(sizeof(*fault_hash), >>>>>>>>> GFP_KERNEL); >>>>>>>>> >>>>>>>>>           + if (!fault_hash) >>>>>>>>> >>>>>>>>>           +         return fault_hash; >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + INIT_CHASH_TABLE(fault_hash->hash, >>>>>>>>> >>>>>>>>>           +                 AMDGPU_PAGEFAULT_HASH_BITS, 8, 0); >>>>>>>>> >>>>>>>>>           + spin_lock_init(&fault_hash->lock); >>>>>>>>> >>>>>>>>>           + fault_hash->count = 0; >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + return fault_hash; >>>>>>>>> >>>>>>>>>           +} >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>            /** >>>>>>>>> >>>>>>>>>            * amdgpu_vm_init - initialize a vm instance >>>>>>>>> >>>>>>>>>            * >>>>>>>>> >>>>>>>>>           @@ -2780,6 +2796,12 @@ int amdgpu_vm_init(struct >>>>>>>>> amdgpu_device *adev, struct amdgpu_vm *vm, >>>>>>>>> >>>>>>>>>                     vm->pasid = pasid; >>>>>>>>> >>>>>>>>>             } >>>>>>>>> >>>>>>>>> >>>>>>>>>           + vm->fault_hash = init_fault_hash(); >>>>>>>>> >>>>>>>>>           + if (!vm->fault_hash) { >>>>>>>>> >>>>>>>>>           +         r = -ENOMEM; >>>>>>>>> >>>>>>>>>           +         goto error_free_root; >>>>>>>>> >>>>>>>>>           + } >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>             INIT_KFIFO(vm->faults); >>>>>>>>> >>>>>>>>>             vm->fault_credit = 16; >>>>>>>>> >>>>>>>>> >>>>>>>>>           @@ -2973,7 +2995,7 @@ void amdgpu_vm_fini(struct >>>>>>>>> amdgpu_device *adev, struct amdgpu_vm *vm) >>>>>>>>> >>>>>>>>> >>>>>>>>>             /* Clear pending page faults from IH when the VM is >>>>>>>>> destroyed */ >>>>>>>>> >>>>>>>>>             while (kfifo_get(&vm->faults, &fault)) >>>>>>>>> >>>>>>>>>           -         amdgpu_ih_clear_fault(adev, fault); >>>>>>>>> >>>>>>>>>           +         amdgpu_vm_clear_fault(vm->fault_hash, >>>>>>>>> fault); >>>>>>>>> >>>>>>>>> >>>>>>>>>             if (vm->pasid) { >>>>>>>>> >>>>>>>>>                     unsigned long flags; >>>>>>>>> >>>>>>>>>           @@ -2983,6 +3005,9 @@ void amdgpu_vm_fini(struct >>>>>>>>> amdgpu_device *adev, struct amdgpu_vm *vm) >>>>>>>>> >>>>>>>>> spin_unlock_irqrestore(&adev->vm_manager.pasid_lock, flags); >>>>>>>>> >>>>>>>>>             } >>>>>>>>> >>>>>>>>> >>>>>>>>>           + kfree(vm->fault_hash); >>>>>>>>> >>>>>>>>>           + vm->fault_hash = NULL; >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>             drm_sched_entity_destroy(&vm->entity); >>>>>>>>> >>>>>>>>> >>>>>>>>>             if (!RB_EMPTY_ROOT(&vm->va.rb_root)) { >>>>>>>>> >>>>>>>>>           @@ -3183,3 +3208,78 @@ void >>>>>>>>> amdgpu_vm_set_task_info(struct amdgpu_vm *vm) >>>>>>>>> >>>>>>>>>                     } >>>>>>>>> >>>>>>>>>             } >>>>>>>>> >>>>>>>>>            } >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           +/** >>>>>>>>> >>>>>>>>>           + * amdgpu_vm_add_fault - Add a page fault record to >>>>>>>>> fault hash table >>>>>>>>> >>>>>>>>>           + * >>>>>>>>> >>>>>>>>>           + * @fault_hash: fault hash table >>>>>>>>> >>>>>>>>>           + * @key: 64-bit encoding of PASID and address >>>>>>>>> >>>>>>>>>           + * >>>>>>>>> >>>>>>>>>           + * This should be called when a retry page fault >>>>>>>>> interrupt is >>>>>>>>> >>>>>>>>>           + * received. If this is a new page fault, it will be >>>>>>>>> added to a hash >>>>>>>>> >>>>>>>>>           + * table. The return value indicates whether this >>>>>>>>> is a >>>>>>>>> new fault, or >>>>>>>>> >>>>>>>>>           + * a fault that was already known and is already >>>>>>>>> being >>>>>>>>> handled. >>>>>>>>> >>>>>>>>>           + * >>>>>>>>> >>>>>>>>>           + * If there are too many pending page faults, this >>>>>>>>> will >>>>>>>>> fail. Retry >>>>>>>>> >>>>>>>>>           + * interrupts should be ignored in this case until >>>>>>>>> there >>>>>>>>> is enough >>>>>>>>> >>>>>>>>>           + * free space. >>>>>>>>> >>>>>>>>>           + * >>>>>>>>> >>>>>>>>>           + * Returns 0 if the fault was added, 1 if the >>>>>>>>> fault was >>>>>>>>> already known, >>>>>>>>> >>>>>>>>>           + * -ENOSPC if there are too many pending faults. >>>>>>>>> >>>>>>>>>           + */ >>>>>>>>> >>>>>>>>>           +int amdgpu_vm_add_fault(struct >>>>>>>>> amdgpu_retryfault_hashtable *fault_hash, u64 key) >>>>>>>>> >>>>>>>>>           +{ >>>>>>>>> >>>>>>>>>           + unsigned long flags; >>>>>>>>> >>>>>>>>>           + int r = -ENOSPC; >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + if (WARN_ON_ONCE(!fault_hash)) >>>>>>>>> >>>>>>>>>           +         /* Should be allocated in amdgpu_vm_init >>>>>>>>> >>>>>>>>>           +          */ >>>>>>>>> >>>>>>>>>           +         return r; >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + spin_lock_irqsave(&fault_hash->lock, flags); >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + /* Only let the hash table fill up to 50% for best >>>>>>>>> performance */ >>>>>>>>> >>>>>>>>>           + if (fault_hash->count >= (1 << >>>>>>>>> (AMDGPU_PAGEFAULT_HASH_BITS-1))) >>>>>>>>> >>>>>>>>>           +         goto unlock_out; >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + r = chash_table_copy_in(&fault_hash->hash, key, >>>>>>>>> NULL); >>>>>>>>> >>>>>>>>>           + if (!r) >>>>>>>>> >>>>>>>>>           +         fault_hash->count++; >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + /* chash_table_copy_in should never fail unless >>>>>>>>> we're >>>>>>>>> losing count */ >>>>>>>>> >>>>>>>>>           + WARN_ON_ONCE(r < 0); >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           +unlock_out: >>>>>>>>> >>>>>>>>>           + spin_unlock_irqrestore(&fault_hash->lock, flags); >>>>>>>>> >>>>>>>>>           + return r; >>>>>>>>> >>>>>>>>>           +} >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           +/** >>>>>>>>> >>>>>>>>>           + * amdgpu_vm_clear_fault - Remove a page fault record >>>>>>>>> >>>>>>>>>           + * >>>>>>>>> >>>>>>>>>           + * @fault_hash: fault hash table >>>>>>>>> >>>>>>>>>           + * @key: 64-bit encoding of PASID and address >>>>>>>>> >>>>>>>>>           + * >>>>>>>>> >>>>>>>>>           + * This should be called when a page fault has been >>>>>>>>> handled. Any >>>>>>>>> >>>>>>>>>           + * future interrupt with this key will be >>>>>>>>> processed as a >>>>>>>>> new >>>>>>>>> >>>>>>>>>           + * page fault. >>>>>>>>> >>>>>>>>>           + */ >>>>>>>>> >>>>>>>>>           +void amdgpu_vm_clear_fault(struct >>>>>>>>> amdgpu_retryfault_hashtable *fault_hash, u64 key) >>>>>>>>> >>>>>>>>>           +{ >>>>>>>>> >>>>>>>>>           + unsigned long flags; >>>>>>>>> >>>>>>>>>           + int r; >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + if (!fault_hash) >>>>>>>>> >>>>>>>>>           +         return; >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + spin_lock_irqsave(&fault_hash->lock, flags); >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + r = chash_table_remove(&fault_hash->hash, key, >>>>>>>>> NULL); >>>>>>>>> >>>>>>>>>           + if (!WARN_ON_ONCE(r < 0)) { >>>>>>>>> >>>>>>>>>           +         fault_hash->count--; >>>>>>>>> >>>>>>>>>           +         WARN_ON_ONCE(fault_hash->count < 0); >>>>>>>>> >>>>>>>>>           + } >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           + spin_unlock_irqrestore(&fault_hash->lock, flags); >>>>>>>>> >>>>>>>>>           +} >>>>>>>>> >>>>>>>>>           diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>>>>>>> >>>>>>>>>           index e275ee7..6eb1da1 100644 >>>>>>>>> >>>>>>>>>           --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>>>>>>> >>>>>>>>>           +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h >>>>>>>>> >>>>>>>>>           @@ -178,6 +178,13 @@ struct amdgpu_task_info { >>>>>>>>> >>>>>>>>>             pid_t  tgid; >>>>>>>>> >>>>>>>>>            }; >>>>>>>>> >>>>>>>>> >>>>>>>>>           +#define AMDGPU_PAGEFAULT_HASH_BITS 8 >>>>>>>>> >>>>>>>>>           +struct amdgpu_retryfault_hashtable { >>>>>>>>> >>>>>>>>>           + DECLARE_CHASH_TABLE(hash, >>>>>>>>> AMDGPU_PAGEFAULT_HASH_BITS, >>>>>>>>> 8, 0); >>>>>>>>> >>>>>>>>>           + spinlock_t    lock; >>>>>>>>> >>>>>>>>>           + int           count; >>>>>>>>> >>>>>>>>>           +}; >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>            struct amdgpu_vm { >>>>>>>>> >>>>>>>>>             /* tree of virtual addresses mapped */ >>>>>>>>> >>>>>>>>>             struct rb_root_cached va; >>>>>>>>> >>>>>>>>>           @@ -240,6 +247,7 @@ struct amdgpu_vm { >>>>>>>>> >>>>>>>>>             struct ttm_lru_bulk_move lru_bulk_move; >>>>>>>>> >>>>>>>>>             /* mark whether can do the bulk move */ >>>>>>>>> >>>>>>>>>             bool                  bulk_moveable; >>>>>>>>> >>>>>>>>>           + struct amdgpu_retryfault_hashtable *fault_hash; >>>>>>>>> >>>>>>>>>            }; >>>>>>>>> >>>>>>>>> >>>>>>>>>            struct amdgpu_vm_manager { >>>>>>>>> >>>>>>>>>           @@ -355,4 +363,8 @@ void >>>>>>>>> amdgpu_vm_set_task_info(struct >>>>>>>>> amdgpu_vm *vm); >>>>>>>>> >>>>>>>>>            void amdgpu_vm_move_to_lru_tail(struct amdgpu_device >>>>>>>>> *adev, >>>>>>>>> >>>>>>>>>                                    struct amdgpu_vm *vm); >>>>>>>>> >>>>>>>>> >>>>>>>>>           +int amdgpu_vm_add_fault(struct >>>>>>>>> amdgpu_retryfault_hashtable *fault_hash, u64 key); >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           +void amdgpu_vm_clear_fault(struct >>>>>>>>> amdgpu_retryfault_hashtable *fault_hash, u64 key); >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>            #endif >>>>>>>>> >>>>>>>>>           diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >>>>>>>>> >>>>>>>>>           index 5ae5ed2..acbe5a7 100644 >>>>>>>>> >>>>>>>>>           --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >>>>>>>>> >>>>>>>>>           +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c >>>>>>>>> >>>>>>>>>           @@ -265,35 +265,36 @@ static bool >>>>>>>>> vega10_ih_prescreen_iv(struct amdgpu_device *adev) >>>>>>>>> >>>>>>>>>                     return true; >>>>>>>>> >>>>>>>>>             } >>>>>>>>> >>>>>>>>> >>>>>>>>>           - addr = ((u64)(dw5 & 0xf) << 44) | ((u64)dw4 << 12); >>>>>>>>> >>>>>>>>>           - key = AMDGPU_VM_FAULT(pasid, addr); >>>>>>>>> >>>>>>>>>           - r = amdgpu_ih_add_fault(adev, key); >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>           - /* Hash table is full or the fault is already being >>>>>>>>> processed, >>>>>>>>> >>>>>>>>>           -  * ignore further page faults >>>>>>>>> >>>>>>>>>           -  */ >>>>>>>>> >>>>>>>>>           - if (r != 0) >>>>>>>>> >>>>>>>>>           -         goto ignore_iv; >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>             /* Track retry faults in per-VM fault FIFO. */ >>>>>>>>> >>>>>>>>>             spin_lock(&adev->vm_manager.pasid_lock); >>>>>>>>> >>>>>>>>>             vm = idr_find(&adev->vm_manager.pasid_idr, pasid); >>>>>>>>> >>>>>>>>>           + addr = ((u64)(dw5 & 0xf) << 44) | ((u64)dw4 << 12); >>>>>>>>> >>>>>>>>>           + key = AMDGPU_VM_FAULT(pasid, addr); >>>>>>>>> >>>>>>>>>             if (!vm) { >>>>>>>>> >>>>>>>>>                     /* VM not found, process it normally */ >>>>>>>>> >>>>>>>>> spin_unlock(&adev->vm_manager.pasid_lock); >>>>>>>>> >>>>>>>>>           -         amdgpu_ih_clear_fault(adev, key); >>>>>>>>> >>>>>>>>>                     return true; >>>>>>>>> >>>>>>>>>           + } else { >>>>>>>>> >>>>>>>>>           +         r = amdgpu_vm_add_fault(vm->fault_hash, >>>>>>>>> key); >>>>>>>>> >>>>>>>>>           + >>>>>>>>> >>>>>>>>>           +         /* Hash table is full or the fault is >>>>>>>>> already >>>>>>>>> being processed, >>>>>>>>> >>>>>>>>>           +          * ignore further page faults >>>>>>>>> >>>>>>>>>           +          */ >>>>>>>>> >>>>>>>>>           +         if (r != 0) { >>>>>>>>> >>>>>>>>>           + spin_unlock(&adev->vm_manager.pasid_lock); >>>>>>>>> >>>>>>>>>           +                 goto ignore_iv; >>>>>>>>> >>>>>>>>>           +         } >>>>>>>>> >>>>>>>>>             } >>>>>>>>> >>>>>>>>>             /* No locking required with single writer and >>>>>>>>> single >>>>>>>>> reader */ >>>>>>>>> >>>>>>>>>             r = kfifo_put(&vm->faults, key); >>>>>>>>> >>>>>>>>>             if (!r) { >>>>>>>>> >>>>>>>>>                     /* FIFO is full. Ignore it until there is >>>>>>>>> space */ >>>>>>>>> >>>>>>>>>           +         amdgpu_vm_clear_fault(vm->fault_hash, key); >>>>>>>>> >>>>>>>>> spin_unlock(&adev->vm_manager.pasid_lock); >>>>>>>>> >>>>>>>>>           -         amdgpu_ih_clear_fault(adev, key); >>>>>>>>> >>>>>>>>>                     goto ignore_iv; >>>>>>>>> >>>>>>>>>             } >>>>>>>>> >>>>>>>>>           - spin_unlock(&adev->vm_manager.pasid_lock); >>>>>>>>> >>>>>>>>> >>>>>>>>>           + spin_unlock(&adev->vm_manager.pasid_lock); >>>>>>>>> >>>>>>>>>             /* It's the first fault for this address, >>>>>>>>> process it >>>>>>>>> normally */ >>>>>>>>> >>>>>>>>>             return true; >>>>>>>>> >>>>>>>>> >>>>>>>>>           @@ -386,14 +387,6 @@ static int vega10_ih_sw_init(void >>>>>>>>> *handle) >>>>>>>>> >>>>>>>>>             adev->irq.ih.use_doorbell = true; >>>>>>>>> >>>>>>>>>             adev->irq.ih.doorbell_index = AMDGPU_DOORBELL64_IH >>>>>>>>> << 1; >>>>>>>>> >>>>>>>>> >>>>>>>>>           - adev->irq.ih.faults = >>>>>>>>> kmalloc(sizeof(*adev->irq.ih.faults), GFP_KERNEL); >>>>>>>>> >>>>>>>>>           - if (!adev->irq.ih.faults) >>>>>>>>> >>>>>>>>>           -         return -ENOMEM; >>>>>>>>> >>>>>>>>>           - INIT_CHASH_TABLE(adev->irq.ih.faults->hash, >>>>>>>>> >>>>>>>>>           -                  AMDGPU_PAGEFAULT_HASH_BITS, 8, 0); >>>>>>>>> >>>>>>>>>           - spin_lock_init(&adev->irq.ih.faults->lock); >>>>>>>>> >>>>>>>>>           - adev->irq.ih.faults->count = 0; >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>             r = amdgpu_irq_init(adev); >>>>>>>>> >>>>>>>>> >>>>>>>>>             return r; >>>>>>>>> >>>>>>>>>           @@ -406,9 +399,6 @@ static int vega10_ih_sw_fini(void >>>>>>>>> *handle) >>>>>>>>> >>>>>>>>>             amdgpu_irq_fini(adev); >>>>>>>>> >>>>>>>>>             amdgpu_ih_ring_fini(adev); >>>>>>>>> >>>>>>>>> >>>>>>>>>           - kfree(adev->irq.ih.faults); >>>>>>>>> >>>>>>>>>           - adev->irq.ih.faults = NULL; >>>>>>>>> >>>>>>>>>           - >>>>>>>>> >>>>>>>>>             return 0; >>>>>>>>> >>>>>>>>>            } >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>       _______________________________________________ >>>>>>>>> >>>>>>>>>       amd-gfx mailing list >>>>>>>>> >>>>>>>>>       amd-gfx at lists.freedesktop.org >>>>>>>>> <mailto:amd-gfx at lists.freedesktop.org> >>>>>>>>> >>>>>>>>>       https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> amd-gfx mailing list >>>>>>>> amd-gfx at lists.freedesktop.org >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>>>> _______________________________________________ >>>>>> amd-gfx mailing list >>>>>> amd-gfx at lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx at lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >