Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

"Pan, Xinhui" <Xinhui.Pan@xxxxxxx> · Thu, 9 Apr 2020 14:07:09 +0000

[AMD Official Use Only - Internal Distribution Only]

Why we break out the loops when there are pending bos to be released?

And I just checked the process_one_work. Right after the work item callback is called,  the workqueue itself will call cond_resched. So I think

From: Koenig, Christian <Christian.Koenig@xxxxxxx>

Sent: Thursday, April 9, 2020 9:38:24 PM

To: Lucas Stach <l.stach@xxxxxxxxxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>

Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx <dri-devel@xxxxxxxxxxxxxxxxxxxxx>

Subject: Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

Am 09.04.20 um 15:25 schrieb Lucas Stach:

> Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:

>> Am 09.04.20 um 03:31 schrieb xinhui pan:

>>> The delayed delete list is per device which might be very huge. And in

>>> a heavy workload test, the list might always not be empty. That will

>>> trigger any RCU stall warnings or softlockups in non-preemptible kernels

>>> Lets do schedule out if possible in that case.

>> Mhm, I'm not sure if that is actually allowed. This is called from a

>> work item and those are not really supposed to be scheduled away.

> Huh? Workitems can schedule out just fine, otherwise they would be

> horribly broken when it comes to sleeping locks.

Let me refine the sentence: Work items are not really supposed to be 

scheduled purposely. E.g. you shouldn't call schedule() or 

cond_resched() like in the case here.

Getting scheduled away because we wait for a lock is of course perfectly 

fine.

>   The workqueue code

> even has measures to keep the workqueues at the expected concurrency

> level by starting other workitems when one of them goes to sleep.

Yeah, and exactly that's what I would say we should avoid here :)

In other words work items can be scheduled away, but they should not if 

not really necessary (e.g. waiting for a lock).

Otherwise as you said new threads for work item processing are started 

up and I don't think we want that.

Just returning from the work item and waiting for the next cycle is most 

likely the better option.

Regards,

Christian.

>

> Regards,

> Lucas

>

>> Maybe rather change the while into while (!list_empty(&bdev->ddestroy)

>> && !should_reschedule(0)).

>>

>> Christian.

>>

>>> Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx>

>>> ---

>>>    drivers/gpu/drm/ttm/ttm_bo.c | 1 +

>>>    1 file changed, 1 insertion(+)

>>>

>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c

>>> index 9e07c3f75156..b8d853cab33b 100644

>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c

>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c

>>> @@ -541,6 +541,7 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all)

>>>              }

>>>    

>>>              ttm_bo_put(bo);

>>> +           cond_resched();

>>>              spin_lock(&glob->lru_lock);

>>>      }

>>>      list_splice_tail(&removed, &bdev->ddestroy);

>> _______________________________________________

>> dri-devel mailing list

>> dri-devel@xxxxxxxxxxxxxxxxxxxxx

>> 
https://nam11.safelinks.protection.outlook.com/?url="">

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel