Hi David/Christian, Tested the proposed method which prints warning only, thereâ??s no problem. I will send another review because itâ??s a totally different change. â?? Sincerely Yours, Pixel On 06/02/2017, 4:56 PM, "Christian König" <deathsimple at vodafone.de> wrote: >Hi Pixel, > >yeah agree with David here, but even busy waiting for the KIQ register >read is not really an option in the interrupt handler. > >Additional to that when we have a VM fault we usually see a mass storm >of them. So allocating and scheduling a work item for each fault like >you do here will certainly only result in a whole system crash. > >For now I would say just use DRM_ERROR() to print a warning that a VM >fault happen and we can't decode it because we can't access the register >under SRIOV. > >Regards, >Christian. > >Am 06.02.2017 um 09:36 schrieb zhoucm1: >> Hi Pixel, >> I got your mean just now, since your VF must use KIQ to read/write >> registers, which use fence_wait to wait reading register completed. >> The alternative way is implementing a new kiq reading/writing register >> way by using udelay instead of fence wait when reading/writing >> register in interrupt context. >> >> Regards, >> David Zhou >> >> On 2017å¹´02æ??06æ?¥ 15:55, Ding, Pixel wrote: >>> Thanks you for your comments, David. I totally agree on your point. >>> >>> However, The VM fault status registers record the latest VM fault >>> info no matter when theyâ??re changed, thatâ??s what we care about since >>> we donâ??t handle too much for VM fault even in bare metal system. >>> >>> On the other hand, what do you think if we insist to sleep in the >>> interrupt context and let the driver runs into a software bug? >>> >>> The VM registers are shared among all VFs and we donâ??t have a copy >>> for each in hardware, I think there's no other way to access them in >>> this case. Do you have any suggestion? >>> >>> â?? >>> Sincerely Yours, >>> Pixel >>> >>> >>> >>> >>> >>> >>> >>> >>> On 06/02/2017, 3:33 PM, "Zhou, David(ChunMing)" <David1.Zhou at amd.com> >>> wrote: >>> >>>> INIT_WORK(&work->base, gmc_v8_0_vm_fault_sched); >>>> However VF is used or not, schedule work shouldn't handle registers >>>> reading for interrupt, especially for status register, which could >>>> have been changed when you handle it in schedule work after interrupt. >>>> >>>> Regards, >>>> David Zhou >>>> >>>> -----Original Message----- >>>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On >>>> Behalf Of Pixel Ding >>>> Sent: Monday, February 06, 2017 3:00 PM >>>> To: amd-gfx at lists.freedesktop.org >>>> Cc: Ding, Pixel <Pixel.Ding at amd.com> >>>> Subject: [PATCH 2/2] drm/amdgpu/virt: schedule work to handle vm >>>> fault for VF >>>> >>>> VF uses KIQ to access registers that invoking fence_wait to get the >>>> accessing completed. When VM fault occurs, the driver can't sleep in >>>> interrupt context. >>>> >>>> For some test cases, VM fault is 'legal' and shouldn't cause driver >>>> soft lockup. >>>> >>>> Signed-off-by: Pixel Ding <Pixel.Ding at amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 46 >>>> ++++++++++++++++++++++++++++++++--- >>>> 1 file changed, 43 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c >>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c >>>> index 7669b32..75c913f 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c >>>> @@ -1231,9 +1231,9 @@ static int >>>> gmc_v8_0_vm_fault_interrupt_state(struct amdgpu_device *adev, >>>> return 0; >>>> } >>>> >>>> -static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev, >>>> - struct amdgpu_irq_src *source, >>>> - struct amdgpu_iv_entry *entry) >>>> +static int gmc_v8_0_process_vm_fault(struct amdgpu_device *adev, >>>> + struct amdgpu_irq_src *source, >>>> + struct amdgpu_iv_entry *entry) >>>> { >>>> u32 addr, status, mc_client; >>>> >>>> @@ -1262,6 +1262,46 @@ static int gmc_v8_0_process_interrupt(struct >>>> amdgpu_device *adev, >>>> return 0; >>>> } >>>> >>>> +struct gmc_vm_fault_work { >>>> + struct work_struct base; >>>> + struct amdgpu_device *adev; >>>> + struct amdgpu_irq_src *source; >>>> + struct amdgpu_iv_entry *entry; >>>> +}; >>>> + >>>> +static void gmc_v8_0_vm_fault_sched(struct work_struct *work) { >>>> + struct gmc_vm_fault_work *vm_work = >>>> + container_of(work, struct gmc_vm_fault_work, base); >>>> + struct amdgpu_device *adev = vm_work->adev; >>>> + struct amdgpu_irq_src *source = vm_work->source; >>>> + struct amdgpu_iv_entry *entry = vm_work->entry; >>>> + >>>> + gmc_v8_0_process_vm_fault(adev, source, entry); >>>> + >>>> + kfree(vm_work); >>>> +} >>>> + >>>> +static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev, >>>> + struct amdgpu_irq_src *source, >>>> + struct amdgpu_iv_entry *entry) { >>>> + struct gmc_vm_fault_work *work = NULL; >>>> + >>>> + if (amdgpu_sriov_vf(adev)) { >>>> + work = kmalloc(sizeof(struct gmc_vm_fault_work), GFP_ATOMIC); >>>> + if (!work) >>>> + return -ENOMEM; >>>> + INIT_WORK(&work->base, gmc_v8_0_vm_fault_sched); >>>> + work->adev = adev; >>>> + work->source = source; >>>> + work->entry = entry; >>>> + return schedule_work(&work->base); >>>> + } >>>> + >>>> + return gmc_v8_0_process_vm_fault(adev, source, entry); } >>>> + >>>> static void fiji_update_mc_medium_grain_clock_gating(struct >>>> amdgpu_device *adev, >>>> bool enable) >>>> { >>>> -- >>>> 2.7.4 >>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx at lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > >