> On Aug 29, 2022, at 2:26 PM, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Mon, Aug 29, 2022 at 06:15:28PM +0000, Chuck Lever III wrote: >> >>> On Aug 29, 2022, at 1:22 PM, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > >>> Even a simple case like mlx5 may cause the NIC to trigger a host >>> memory allocation, which is done in another thread and done as a >>> normal GFP_KERNEL. This memory allocation must progress before a >>> CQ/QP/MR/etc can be created. So now we are deadlocked again. >> >> That sounds to me like a bug in mlx5. The driver is supposed >> to respect the caller's GFP settings. Again, if the request >> is small, it's likely to succeed anyway, but larger requests >> are not reliable and need to fail quickly so the system can >> move onto other fishing spots. > > It is a design artifact, the FW is the one requesting the memory and > it has no idea about kernel GFP flags. As above a FW thread could have > already started requesting memory for some other purpose and we may > already be inside the mlx5 FW page request thread under a GFP_KERNEL > allocation doing reclaim. How can this ever be fixed? I'm willing to admit I'm no expert here. But... IIUC the deadlock problem is triggered by /waiting/ for memory to become available to satisfy an allocation request. So using GFP_NOWAIT, GFP_NOIO/memalloc_noio, and GFP_NOFS/memalloc_nofs when drivers allocate memory should be enough to prevent a deadlock and keep the allocations from diving into reserved memory. I believe only GFP_ATOMIC goes for reserved memory pools. These others are normal allocations that simply do not wait if a direct reclaim should be required. The second-order issue is that the "failed to allocate" recovery paths are not likely to be well tested, and these other flags make that kind of failure more likely. Enable memory allocation failure injection and begin fixing the shit that comes up. If you've got "can't fail" scenarios, we'll have to look at those closely. -- Chuck Lever