[PATCH 2/2] drm/amdgpu/virt: schedule work to handle vm fault for VF

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Pixel,

yeah agree with David here, but even busy waiting for the KIQ register 
read is not really an option in the interrupt handler.

Additional to that when we have a VM fault we usually see a mass storm 
of them. So allocating and scheduling a work item for each fault like 
you do here will certainly only result in a whole system crash.

For now I would say just use DRM_ERROR() to print a warning that a VM 
fault happen and we can't decode it because we can't access the register 
under SRIOV.

Regards,
Christian.

Am 06.02.2017 um 09:36 schrieb zhoucm1:
> Hi Pixel,
> I got your mean just now, since your VF must use KIQ to read/write 
> registers, which use fence_wait to wait reading register completed.
> The alternative way is implementing a new kiq reading/writing register 
> way by using udelay instead of fence wait when reading/writing 
> register in interrupt context.
>
> Regards,
> David Zhou
>
> On 2017å¹´02æ??06æ?¥ 15:55, Ding, Pixel wrote:
>> Thanks you for your comments, David. I totally agree on your point.
>>
>> However, The VM fault status registers record the latest VM fault 
>> info no matter when theyâ??re changed, thatâ??s what we care about since 
>> we donâ??t handle too much for VM fault even in bare metal system.
>>
>> On the other hand, what do you think if we insist to sleep in the 
>> interrupt context and let the driver runs into a software bug?
>>
>> The VM registers are shared among all VFs and we donâ??t have a copy 
>> for each in hardware, I think there's no other way to access them in 
>> this case. Do you have any suggestion?
>>
>> â??
>> Sincerely Yours,
>> Pixel
>>
>>
>>
>>
>>
>>
>>
>>
>> On 06/02/2017, 3:33 PM, "Zhou, David(ChunMing)" <David1.Zhou at amd.com> 
>> wrote:
>>
>>> INIT_WORK(&work->base, gmc_v8_0_vm_fault_sched);
>>> However VF is used or not, schedule work shouldn't handle registers 
>>> reading for interrupt, especially for status register, which could 
>>> have been changed when you handle it in schedule work after interrupt.
>>>
>>> Regards,
>>> David Zhou
>>>
>>> -----Original Message-----
>>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On 
>>> Behalf Of Pixel Ding
>>> Sent: Monday, February 06, 2017 3:00 PM
>>> To: amd-gfx at lists.freedesktop.org
>>> Cc: Ding, Pixel <Pixel.Ding at amd.com>
>>> Subject: [PATCH 2/2] drm/amdgpu/virt: schedule work to handle vm 
>>> fault for VF
>>>
>>> VF uses KIQ to access registers that invoking fence_wait to get the 
>>> accessing completed. When VM fault occurs, the driver can't sleep in 
>>> interrupt context.
>>>
>>> For some test cases, VM fault is 'legal' and shouldn't cause driver 
>>> soft lockup.
>>>
>>> Signed-off-by: Pixel Ding <Pixel.Ding at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 46 
>>> ++++++++++++++++++++++++++++++++---
>>> 1 file changed, 43 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>> index 7669b32..75c913f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>> @@ -1231,9 +1231,9 @@ static int 
>>> gmc_v8_0_vm_fault_interrupt_state(struct amdgpu_device *adev,
>>>     return 0;
>>> }
>>>
>>> -static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev,
>>> -                      struct amdgpu_irq_src *source,
>>> -                      struct amdgpu_iv_entry *entry)
>>> +static int gmc_v8_0_process_vm_fault(struct amdgpu_device *adev,
>>> +                     struct amdgpu_irq_src *source,
>>> +                     struct amdgpu_iv_entry *entry)
>>> {
>>>     u32 addr, status, mc_client;
>>>
>>> @@ -1262,6 +1262,46 @@ static int gmc_v8_0_process_interrupt(struct 
>>> amdgpu_device *adev,
>>>     return 0;
>>> }
>>>
>>> +struct gmc_vm_fault_work {
>>> +    struct work_struct      base;
>>> +    struct amdgpu_device    *adev;
>>> +    struct amdgpu_irq_src   *source;
>>> +    struct amdgpu_iv_entry  *entry;
>>> +};
>>> +
>>> +static void gmc_v8_0_vm_fault_sched(struct work_struct *work) {
>>> +    struct gmc_vm_fault_work *vm_work =
>>> +        container_of(work, struct gmc_vm_fault_work, base);
>>> +    struct amdgpu_device *adev = vm_work->adev;
>>> +    struct amdgpu_irq_src *source = vm_work->source;
>>> +    struct amdgpu_iv_entry *entry = vm_work->entry;
>>> +
>>> +    gmc_v8_0_process_vm_fault(adev, source, entry);
>>> +
>>> +    kfree(vm_work);
>>> +}
>>> +
>>> +static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev,
>>> +                      struct amdgpu_irq_src *source,
>>> +                      struct amdgpu_iv_entry *entry) {
>>> +    struct gmc_vm_fault_work *work = NULL;
>>> +
>>> +    if (amdgpu_sriov_vf(adev)) {
>>> +        work = kmalloc(sizeof(struct gmc_vm_fault_work), GFP_ATOMIC);
>>> +        if (!work)
>>> +            return -ENOMEM;
>>> +        INIT_WORK(&work->base, gmc_v8_0_vm_fault_sched);
>>> +        work->adev = adev;
>>> +        work->source = source;
>>> +        work->entry = entry;
>>> +        return schedule_work(&work->base);
>>> +    }
>>> +
>>> +    return gmc_v8_0_process_vm_fault(adev, source, entry); }
>>> +
>>> static void fiji_update_mc_medium_grain_clock_gating(struct 
>>> amdgpu_device *adev,
>>>                              bool enable)
>>> {
>>> -- 
>>> 2.7.4
>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux