On Sun, Apr 25, 2021 at 04:21:03PM +0000, Chuck Lever III wrote: > > > > On Apr 25, 2021, at 10:19 AM, Dan Aloni <dan@xxxxxxxxxxxx> wrote: > > > > On Mon, Apr 19, 2021 at 02:03:12PM -0400, Chuck Lever wrote: > >> Better not to touch MRs involved in a flush or post error until the > >> Send and Receive Queues are drained and the transport is fully > >> quiescent. Simply don't insert such MRs back onto the free list. > >> They remain on mr_all and will be released when the connection is > >> torn down. > >> > >> I had thought that recycling would prevent hardware resources from > >> being tied up for a long time. However, since v5.7, a transport > >> disconnect destroys the QP and other hardware-owned resources. The > >> MRs get cleaned up nicely at that point. > >> > >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > Is this a fix for the crash below? > > Yes, it is plausible. That is a familiar backtrace. > > However, it's usually because the provider called the LocalInv > completion handler twice for the same CQE. Which provider is this? It's mlx5 driver, ConnectX-4 (MT27700). -- Dan Aloni