On Wed, Feb 19, 2020 at 03:40:59PM +0800, Yunsheng Lin wrote: > +cc Bhaktipriya, Tejun and Jeff > > On 2020/2/19 14:45, Leon Romanovsky wrote: > > On Wed, Feb 19, 2020 at 09:13:23AM +0800, Yunsheng Lin wrote: > >> On 2020/2/18 23:31, Jason Gunthorpe wrote: > >>> On Tue, Feb 18, 2020 at 11:35:35AM +0800, Lang Cheng wrote: > >>>> The hns3 driver sets "hclge_service_task" workqueue with > >>>> WQ_MEM_RECLAIM flag in order to guarantee forward progress > >>>> under memory pressure. > >>> > >>> Don't do that. WQ_MEM_RECLAIM is only to be used by things interlinked > >>> with reclaimed processing. > >>> > >>> Work on queues marked with WQ_MEM_RECLAIM can't use GFP_KERNEL > >>> allocations, can't do certain kinds of sleeps, can't hold certain > >>> kinds of locks, etc. > > By the way, what kind of sleeps and locks can not be done in the work > queued to wq marked with WQ_MEM_RECLAIM? Anything that recurses back into a blocking allocation function. If we are freeing memory because an allocation failed (eg GFP_KERNEL) then we cannot go back into a blockable allocation while trying to progress the first failing allocation. That is a deadlock. So a WQ cannot hold any locks that enclose GFP_KERNEL in any other threads. Unfortunately we don't have a lockdep test for this by default. > >> hns3 ethernet driver may be used as the low level transport of a > >> network file system, memory reclaim data path may depend on the > >> worker in hns3 driver to bring back the ethernet link so that it flush > >> the some cache to network based disk. > > > > Unlikely that this "network file system" dependency on ethernet link is correct. > > Ok, I may be wrong about the above usecase. but the below commit > explicitly state that network devices may be used in memory reclaim > path. I don't really know how this works when the networking stacks intersect with the block stack. Forward progress on something like a NVMeOF requires a lot of stuff to be working, and presumably under reclaim. But, we can't make everything WQ_MEM_RECLAIM safe, because we could never do a GFP_KERNEL allocation.. I have never seen specific guidance what to do here, I assume it is broken. Jason