Re: [RFC rdma-next] RDMA/core: Add attribute WQ_MEM_RECLAIM to workqueue "infiniband"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+cc Bhaktipriya, Tejun and Jeff

On 2020/2/19 14:45, Leon Romanovsky wrote:
> On Wed, Feb 19, 2020 at 09:13:23AM +0800, Yunsheng Lin wrote:
>> On 2020/2/18 23:31, Jason Gunthorpe wrote:
>>> On Tue, Feb 18, 2020 at 11:35:35AM +0800, Lang Cheng wrote:
>>>> The hns3 driver sets "hclge_service_task" workqueue with
>>>> WQ_MEM_RECLAIM flag in order to guarantee forward progress
>>>> under memory pressure.
>>>
>>> Don't do that. WQ_MEM_RECLAIM is only to be used by things interlinked
>>> with reclaimed processing.
>>>
>>> Work on queues marked with WQ_MEM_RECLAIM can't use GFP_KERNEL
>>> allocations, can't do certain kinds of sleeps, can't hold certain
>>> kinds of locks, etc.

By the way, what kind of sleeps and locks can not be done in the work
queued to wq marked with WQ_MEM_RECLAIM?

>>
>> From mlx5 driver, it seems that there is GFP_KERNEL allocations
>> on wq marked with WQ_MEM_RECLAIM too:
>>
>> mlx5e_tx_timeout_work() -> mlx5e_safe_reopen_channels() ->
>> mlx5e_safe_switch_channels() -> mlx5e_open_channels()
>>
>> kcalloc() is called with GFP_KERNEL in mlx5e_open_channels(),
>> and mlx5e_tx_timeout_work() is queued with priv->wq, which is
>> allocated with WQ_MEM_RECLAIM flags. see:
>>
>> mlx5e_netdev_init() -> create_singlethread_workqueue()
> 
> There are two reasons for that, first mlx5 driver was written far before
> WQ_MEM_RECLAIM usage was clarified, second mlx5 has bugs.
> 
>>
>>
>> From the comment in kernel/workqueue.c, the work queued with
>> wq with WQ_MEM_RECLAIM flag set seems to be executed without
>> blocking under some rare case. I still not quite understand
>> the comment, and I can not find any doc that point out the
>> GFP_KERNEL allocations can not be done in wq with WQ_MEM_RECLAIM
>> yet. Is there any doc that mentions that GFP_KERNEL allocations
>> can not be done in wq with WQ_MEM_RECLAIM?
> 
> It is whole purpose of WQ_MEM_RECLAIM flag - allow progress in case of
> memory pressure. Allocation memory while we are under memory pressure
> is an invitation for a disaster.

Ok, make sense.

> 
>>
>>
>> /**
>>  * rescuer_thread - the rescuer thread function
>>  * @__rescuer: self
>>  *
>>  * Workqueue rescuer thread function.  There's one rescuer for each
>>  * workqueue which has WQ_MEM_RECLAIM set.
>>  *
>>  * Regular work processing on a pool may block trying to create a new
>>  * worker which uses GFP_KERNEL allocation which has slight chance of
>>  * developing into deadlock if some works currently on the same queue
>>  * need to be processed to satisfy the GFP_KERNEL allocation.  This is
>>  * the problem rescuer solves.
>>  *
>>  * When such condition is possible, the pool summons rescuers of all
>>  * workqueues which have works queued on the pool and let them process
>>  * those works so that forward progress can be guaranteed.
>>  *
>>  * This should happen rarely.
>>  *
>>  * Return: 0
>>  */
>>
>>
>> The below is the reason we add the sets "hclge_service_task" workqueue
>> with WQ_MEM_RECLAIM through analysing why other ethernet drivers has
>> allocated wq with WQ_MEM_RECLAIM flag, I may be wrong about that:
> 
> Many drivers are developed using copy/paste technique, so it is wrong
> to assume that "other ethernet drivers" did the right thing.
> 
>>
>> hns3 ethernet driver may be used as the low level transport of a
>> network file system, memory reclaim data path may depend on the
>> worker in hns3 driver to bring back the ethernet link so that it flush
>> the some cache to network based disk.
> 
> Unlikely that this "network file system" dependency on ethernet link is correct.

Ok, I may be wrong about the above usecase.
but the below commit explicitly state that network devices may be used in
memory reclaim path.

0a38c17a21a0 ("fm10k: Remove create_workqueue"):

fm10k: Remove create_workqueue

alloc_workqueue replaces deprecated create_workqueue().

A dedicated workqueue has been used since the workitem (viz
fm10k_service_task, which manages and runs other subtasks) is involved in
normal device operation and requires forward progress under memory
pressure.

create_workqueue has been replaced with alloc_workqueue with max_active
as 0 since there is no need for throttling the number of active work
items.

Since network devices may be used in memory reclaim path,
WQ_MEM_RECLAIM has been set to guarantee forward progress.

flush_workqueue is unnecessary since destroy_workqueue() itself calls
drain_workqueue() which flushes repeatedly till the workqueue
becomes empty. Hence the call to flush_workqueue() has been dropped.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@xxxxxxxxx>
Acked-by: Tejun Heo <tj@xxxxxxxxxx>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@xxxxxxxxx>

So:
1. Maybe the above commit log is misleading, and network device driver's
   wq does not need the WQ_MEM_RECLAIM flag, then maybe document what can
   not be done in the work queued to wq marked with WQ_MEM_RECLAIM, and
   remove the WQ_MEM_RECLAIM flag for the wq of network device driver.


2. If the network device driver's wq does need the WQ_MEM_RECLAIM flag, then
   hns3 may have tow problems here: WQ_MEM_RECLAIM wq flushing !WQ_MEM_RECLAIM
   wq problem and GFP_KERNEL allocations in the work queued to WQ_MEM_RECLAIM wq.

> 
> Thanks
> 
>>
>>>
>>> Jason
>>>
>>>
>>
> 
> .
> 




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux