I think it doesn't matter if workitem schedule out. Even we did not schedule out, the workqueue itself will schedule out later.
So it did not break anything with this patch I think.
From: Pan, Xinhui <Xinhui.Pan@xxxxxxx>
Sent: Thursday, April 9, 2020 10:07:09 PM
To: Lucas Stach <l.stach@xxxxxxxxxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>
Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx <dri-devel@xxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker
Sent: Thursday, April 9, 2020 10:07:09 PM
To: Lucas Stach <l.stach@xxxxxxxxxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>
Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx <dri-devel@xxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker
Why we break out the loops when there are pending bos to be released?
And I just checked the process_one_work. Right after the work item callback is called, the workqueue itself will call cond_resched. So I think
From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Thursday, April 9, 2020 9:38:24 PM
To: Lucas Stach <l.stach@xxxxxxxxxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx <dri-devel@xxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker
Sent: Thursday, April 9, 2020 9:38:24 PM
To: Lucas Stach <l.stach@xxxxxxxxxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx <dri-devel@xxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker
Am 09.04.20 um 15:25 schrieb Lucas Stach:
> Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
>> Am 09.04.20 um 03:31 schrieb xinhui pan:
>>> The delayed delete list is per device which might be very huge. And in
>>> a heavy workload test, the list might always not be empty. That will
>>> trigger any RCU stall warnings or softlockups in non-preemptible kernels
>>> Lets do schedule out if possible in that case.
>> Mhm, I'm not sure if that is actually allowed. This is called from a
>> work item and those are not really supposed to be scheduled away.
> Huh? Workitems can schedule out just fine, otherwise they would be
> horribly broken when it comes to sleeping locks.
Let me refine the sentence: Work items are not really supposed to be
scheduled purposely. E.g. you shouldn't call schedule() or
cond_resched() like in the case here.
Getting scheduled away because we wait for a lock is of course perfectly
fine.
> The workqueue code
> even has measures to keep the workqueues at the expected concurrency
> level by starting other workitems when one of them goes to sleep.
Yeah, and exactly that's what I would say we should avoid here :)
In other words work items can be scheduled away, but they should not if
not really necessary (e.g. waiting for a lock).
Otherwise as you said new threads for work item processing are started
up and I don't think we want that.
Just returning from the work item and waiting for the next cycle is most
likely the better option.
Regards,
Christian.
>
> Regards,
> Lucas
>
>> Maybe rather change the while into while (!list_empty(&bdev->ddestroy)
>> && !should_reschedule(0)).
>>
>> Christian.
>>
>>> Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>
>>> ---
>>> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 9e07c3f75156..b8d853cab33b 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all)
>>> }
>>>
>>> ttm_bo_put(bo);
>>> + cond_resched();
>>> spin_lock(&glob->lru_lock);
>>> }
>>> list_splice_tail(&removed, &bdev->ddestroy);
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@xxxxxxxxxxxxxxxxxxxxx
>> https://nam11.safelinks.protection.outlook.com/?url="">
> Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
>> Am 09.04.20 um 03:31 schrieb xinhui pan:
>>> The delayed delete list is per device which might be very huge. And in
>>> a heavy workload test, the list might always not be empty. That will
>>> trigger any RCU stall warnings or softlockups in non-preemptible kernels
>>> Lets do schedule out if possible in that case.
>> Mhm, I'm not sure if that is actually allowed. This is called from a
>> work item and those are not really supposed to be scheduled away.
> Huh? Workitems can schedule out just fine, otherwise they would be
> horribly broken when it comes to sleeping locks.
Let me refine the sentence: Work items are not really supposed to be
scheduled purposely. E.g. you shouldn't call schedule() or
cond_resched() like in the case here.
Getting scheduled away because we wait for a lock is of course perfectly
fine.
> The workqueue code
> even has measures to keep the workqueues at the expected concurrency
> level by starting other workitems when one of them goes to sleep.
Yeah, and exactly that's what I would say we should avoid here :)
In other words work items can be scheduled away, but they should not if
not really necessary (e.g. waiting for a lock).
Otherwise as you said new threads for work item processing are started
up and I don't think we want that.
Just returning from the work item and waiting for the next cycle is most
likely the better option.
Regards,
Christian.
>
> Regards,
> Lucas
>
>> Maybe rather change the while into while (!list_empty(&bdev->ddestroy)
>> && !should_reschedule(0)).
>>
>> Christian.
>>
>>> Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>
>>> ---
>>> drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 9e07c3f75156..b8d853cab33b 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all)
>>> }
>>>
>>> ttm_bo_put(bo);
>>> + cond_resched();
>>> spin_lock(&glob->lru_lock);
>>> }
>>> list_splice_tail(&removed, &bdev->ddestroy);
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@xxxxxxxxxxxxxxxxxxxxx
>> https://nam11.safelinks.protection.outlook.com/?url="">
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx