+cc Bhaktipriya, Tejun and Jeff On 2020/2/19 14:45, Leon Romanovsky wrote: > On Wed, Feb 19, 2020 at 09:13:23AM +0800, Yunsheng Lin wrote: >> On 2020/2/18 23:31, Jason Gunthorpe wrote: >>> On Tue, Feb 18, 2020 at 11:35:35AM +0800, Lang Cheng wrote: >>>> The hns3 driver sets "hclge_service_task" workqueue with >>>> WQ_MEM_RECLAIM flag in order to guarantee forward progress >>>> under memory pressure. >>> >>> Don't do that. WQ_MEM_RECLAIM is only to be used by things interlinked >>> with reclaimed processing. >>> >>> Work on queues marked with WQ_MEM_RECLAIM can't use GFP_KERNEL >>> allocations, can't do certain kinds of sleeps, can't hold certain >>> kinds of locks, etc. By the way, what kind of sleeps and locks can not be done in the work queued to wq marked with WQ_MEM_RECLAIM? >> >> From mlx5 driver, it seems that there is GFP_KERNEL allocations >> on wq marked with WQ_MEM_RECLAIM too: >> >> mlx5e_tx_timeout_work() -> mlx5e_safe_reopen_channels() -> >> mlx5e_safe_switch_channels() -> mlx5e_open_channels() >> >> kcalloc() is called with GFP_KERNEL in mlx5e_open_channels(), >> and mlx5e_tx_timeout_work() is queued with priv->wq, which is >> allocated with WQ_MEM_RECLAIM flags. see: >> >> mlx5e_netdev_init() -> create_singlethread_workqueue() > > There are two reasons for that, first mlx5 driver was written far before > WQ_MEM_RECLAIM usage was clarified, second mlx5 has bugs. > >> >> >> From the comment in kernel/workqueue.c, the work queued with >> wq with WQ_MEM_RECLAIM flag set seems to be executed without >> blocking under some rare case. I still not quite understand >> the comment, and I can not find any doc that point out the >> GFP_KERNEL allocations can not be done in wq with WQ_MEM_RECLAIM >> yet. Is there any doc that mentions that GFP_KERNEL allocations >> can not be done in wq with WQ_MEM_RECLAIM? > > It is whole purpose of WQ_MEM_RECLAIM flag - allow progress in case of > memory pressure. Allocation memory while we are under memory pressure > is an invitation for a disaster. Ok, make sense. > >> >> >> /** >> * rescuer_thread - the rescuer thread function >> * @__rescuer: self >> * >> * Workqueue rescuer thread function. There's one rescuer for each >> * workqueue which has WQ_MEM_RECLAIM set. >> * >> * Regular work processing on a pool may block trying to create a new >> * worker which uses GFP_KERNEL allocation which has slight chance of >> * developing into deadlock if some works currently on the same queue >> * need to be processed to satisfy the GFP_KERNEL allocation. This is >> * the problem rescuer solves. >> * >> * When such condition is possible, the pool summons rescuers of all >> * workqueues which have works queued on the pool and let them process >> * those works so that forward progress can be guaranteed. >> * >> * This should happen rarely. >> * >> * Return: 0 >> */ >> >> >> The below is the reason we add the sets "hclge_service_task" workqueue >> with WQ_MEM_RECLAIM through analysing why other ethernet drivers has >> allocated wq with WQ_MEM_RECLAIM flag, I may be wrong about that: > > Many drivers are developed using copy/paste technique, so it is wrong > to assume that "other ethernet drivers" did the right thing. > >> >> hns3 ethernet driver may be used as the low level transport of a >> network file system, memory reclaim data path may depend on the >> worker in hns3 driver to bring back the ethernet link so that it flush >> the some cache to network based disk. > > Unlikely that this "network file system" dependency on ethernet link is correct. Ok, I may be wrong about the above usecase. but the below commit explicitly state that network devices may be used in memory reclaim path. 0a38c17a21a0 ("fm10k: Remove create_workqueue"): fm10k: Remove create_workqueue alloc_workqueue replaces deprecated create_workqueue(). A dedicated workqueue has been used since the workitem (viz fm10k_service_task, which manages and runs other subtasks) is involved in normal device operation and requires forward progress under memory pressure. create_workqueue has been replaced with alloc_workqueue with max_active as 0 since there is no need for throttling the number of active work items. Since network devices may be used in memory reclaim path, WQ_MEM_RECLAIM has been set to guarantee forward progress. flush_workqueue is unnecessary since destroy_workqueue() itself calls drain_workqueue() which flushes repeatedly till the workqueue becomes empty. Hence the call to flush_workqueue() has been dropped. Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@xxxxxxxxx> Acked-by: Tejun Heo <tj@xxxxxxxxxx> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@xxxxxxxxx> So: 1. Maybe the above commit log is misleading, and network device driver's wq does not need the WQ_MEM_RECLAIM flag, then maybe document what can not be done in the work queued to wq marked with WQ_MEM_RECLAIM, and remove the WQ_MEM_RECLAIM flag for the wq of network device driver. 2. If the network device driver's wq does need the WQ_MEM_RECLAIM flag, then hns3 may have tow problems here: WQ_MEM_RECLAIM wq flushing !WQ_MEM_RECLAIM wq problem and GFP_KERNEL allocations in the work queued to WQ_MEM_RECLAIM wq. > > Thanks > >> >>> >>> Jason >>> >>> >> > > . >