Re: [PATCH v3 15/26] xprtrdma: Do not recycle MR after FastReg/LocalInv flushes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Apr 25, 2021, at 10:19 AM, Dan Aloni <dan@xxxxxxxxxxxx> wrote:
> 
> On Mon, Apr 19, 2021 at 02:03:12PM -0400, Chuck Lever wrote:
>> Better not to touch MRs involved in a flush or post error until the
>> Send and Receive Queues are drained and the transport is fully
>> quiescent. Simply don't insert such MRs back onto the free list.
>> They remain on mr_all and will be released when the connection is
>> torn down.
>> 
>> I had thought that recycling would prevent hardware resources from
>> being tied up for a long time. However, since v5.7, a transport
>> disconnect destroys the QP and other hardware-owned resources. The
>> MRs get cleaned up nicely at that point.
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> 
> Is this a fix for the crash below?

Yes, it is plausible. That is a familiar backtrace.

However, it's usually because the provider called the LocalInv
completion handler twice for the same CQE. Which provider is this?


> I just wonder if it appeared for
> others in the wild, and the fix is not just theoretical.
> 
>    WARNING: CPU: 5 PID: 20312 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
>    list_del corruption, ffff9df150b06768->next is LIST_POISON1 (dead000000000100)
> 
>    Call Trace:
>     [<ffffffff99764147>] dump_stack+0x19/0x1b
>     [<ffffffff99098848>] __warn+0xd8/0x100
>     [<ffffffff990988cf>] warn_slowpath_fmt+0x5f/0x80
>     [<ffffffff9921d5f6>] ? kfree+0x106/0x140
>     [<ffffffff99396953>] __list_del_entry+0x63/0xd0
>     [<ffffffff993969cd>] list_del+0xd/0x30
>     [<ffffffffc0bb307f>] frwr_mr_recycle+0xaf/0x150 [rpcrdma]
>     [<ffffffffc0bb3264>] frwr_wc_localinv+0x94/0xa0 [rpcrdma]
>     [<ffffffffc067d20e>] __ib_process_cq+0x8e/0x100 [ib_core]
>     [<ffffffffc067d2f9>] ib_cq_poll_work+0x29/0x70 [ib_core]
>     [<ffffffff990baf9f>] process_one_work+0x17f/0x440
>     [<ffffffff990bc036>] worker_thread+0x126/0x3c0
>     [<ffffffff990bbf10>] ? manage_workers.isra.25+0x2a0/0x2a0
>     [<ffffffff990c2e81>] kthread+0xd1/0xe0
>     [<ffffffff990c2db0>] ? insert_kthread_work+0x40/0x40
>     [<ffffffff99776c37>] ret_from_fork_nospec_begin+0x21/0x21
>     [<ffffffff990c2db0>] ? insert_kthread_work+0x40/0x40
> 
> -- 
> Dan Aloni

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux