On Mon, 2019-05-06 at 12:31 +0000, Marciniszyn, Mike wrote: > > Correct me if I'm wrong Tejun, but the key issues are: > > > > All WQ_MEM_RECLAIM work queues are eligible to be run when the > > machine > > is under extreme memory pressure and attempting to reclaim memory. That > > means that the workqueue: > > > > 1) MUST not perform any GFP_ATOMIC allocations as this could deadlock > > The send engine code WILL do a GFP_ATOMIC allocation but the code handles failure as > will any other resource shortage. You're right. I was misremembering the flag's full meaning (I double checked before writing this). If you are holding a spinlock (or anything else that means you can't sleep), you must use GFP_ATOMIC and you must be prepared for failure. So, before putting WQ_MEM_RECLAIM on your workqueue, it should use ATOMIC and be prepared for failure. > > 2) SHOULD not rely on any GFP_KERNEL allocations as these may fail > > There are no GFP_KERNEL allocations in the send engine code. Right. > > 3) MUST complete without blocking > > All resource blockages are handled by queuing the current QP being serviced by > the send engine for and interrupt to wake that QP up via the send engine. Ok. > > 4) SHOULD ideally always make some sort of forward progress if at all > > possible without needing memory allocations to do so > > > > As noted above. Right. > > Mike, does hfi1_do_send() meet these requirements? If not, we should > > not be putting WQ_MEM_RECLAIM on it, and instead should find another > > solution to the current trace issue. > > > > I'm not sure I understand the 1) above. > > Tejun, can you elaborate? My mistake. It's been a long while since I coded the stuff I did for memory reclaim pressure and I had my flag usage wrong in my memory. From the description you just gave, the original patch to add WQ_MEM_RECLAIM is ok. I probably still need to audit the ipoib usage though. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: This is a digitally signed message part