Re: [PATCH net-next] net: link_watch: prevent starvation when processing linkwatch wq

Yunsheng Lin <linyunsheng@xxxxxxxxxx> · Tue, 25 Jun 2019 10:28:04 +0800

On 2019/5/29 16:59, Yunsheng Lin wrote:
> On 2019/5/29 14:58, David Miller wrote:
>> From: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
>> Date: Mon, 27 May 2019 09:47:54 +0800
>>
>>> When user has configured a large number of virtual netdev, such
>>> as 4K vlans, the carrier on/off operation of the real netdev
>>> will also cause it's virtual netdev's link state to be processed
>>> in linkwatch. Currently, the processing is done in a work queue,
>>> which may cause worker starvation problem for other work queue.
>>>
>>> This patch releases the cpu when link watch worker has processed
>>> a fixed number of netdev' link watch event, and schedule the
>>> work queue again when there is still link watch event remaining.
>>>
>>> Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx>
>>
>> Why not rtnl_unlock(); yield(); rtnl_lock(); every "100" events
>> processed?
>>
>> That seems better than adding all of this overhead to reschedule the
>> workqueue every 100 items.
> 
> One minor concern, the above solution does not seem to solve the cpu
> starvation for other normal workqueue which was scheduled on the same
> cpu as linkwatch. Maybe I misunderstand the workqueue or there is other
> consideration here? :)
> 
> Anyway, I will implemet it as you suggested and test it before posting V2.
> Thanks.

Hi, David

I stress tested the above solution with a lot of vlan dev and qemu-kvm with
vf passthrongh mode, the linkwatch wq sometimes block the irqfd_inject wq
when they are scheduled on the same cpu, which may cause interrupt delay
problem for vm.

Rescheduling workqueue every 100 items does give irqfd_inject wq to run sooner,
which alleviate the interrupt delay problems for vm.

So It is ok for me to fall back to reschedule the link watch wq every 100 items,
or is there a better way to fix it properly?

> 
>>
>> .
>>
> 
> 
> .
>