Hi, Thanks. I will test this patch, but I am worried whether it will affect the performance. Should we also consider null pointer protection? zhang.guanghui@xxxxxxxx From: Maurizio Lombardi Date: 2025-02-12 16:52 To: Maurizio Lombardi; zhang.guanghui@xxxxxxxx; chunguang.xu CC: mgurtovoy; sagi; kbusch; sashal; linux-kernel; linux-nvme; linux-block Subject: Re: nvme-tcp: fix a possible UAF when failing to send request On Wed Feb 12, 2025 at 9:11 AM CET, Maurizio Lombardi wrote: > On Tue Feb 11, 2025 at 9:04 AM CET, zhang.guanghui@xxxxxxxx wrote: >> Hi >> >> This is a race issue, I can't reproduce it stably yet. I have not tested the latest kernel. but in fact, I've synced some nvme-tcp patches from lastest upstream, > > Hello, could you try this patch? > > queue_lock should protect against concurrent "error recovery", > + mutex_lock(&queue->queue_lock); Unfortunately I've just realized that queue_lock won't save us from the race against the controller reset, it's still possible we lock a destroyed mutex. So just try this simplified patch, I will try to figure out something else: diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 841238f38fdd..b714e1691c30 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2660,7 +2660,10 @@ static int nvme_tcp_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) set_bit(NVME_TCP_Q_POLLING, &queue->flags); if (sk_can_busy_loop(sk) && skb_queue_empty_lockless(&sk->sk_receive_queue)) sk_busy_loop(sk, true); + + mutex_lock(&queue->send_mutex); nvme_tcp_try_recv(queue); + mutex_unlock(&queue->send_mutex); clear_bit(NVME_TCP_Q_POLLING, &queue->flags); return queue->nr_cqe; } Maurizio