Agree. Probably the change in dequeue_mid() is not needed but won't hurt at least - right now dequeue_mid() is being called from the demultiplex thread only, so as cifs_reconnect(). I am wondering how your patch behaves with the repro. In general, I am starting to think more that we should probably remove a MID immediately from the pending list once we parse MessageId from the response and find the entry in the list. Especially with the recent parallel decryption capability that Steve is working on, we would need to break the above assumption and process the mid entry in another thread. There are some cases where we don't end up removing the MID but for those cases we may simply add the entry back. Anyway, it needs much more thinking and out of the scope of the bugfix being discussed. -- Best regards, Pavel Shilovsky -----Original Message----- From: Ronnie Sahlberg <lsahlber@xxxxxxxxxx> Sent: Thursday, October 17, 2019 3:54 PM To: Pavel Shilovskiy <pshilov@xxxxxxxxxxxxx> Cc: David Wysochanski <dwysocha@xxxxxxxxxx>; linux-cifs <linux-cifs@xxxxxxxxxxxxxxx>; Frank Sorenson <sorenson@xxxxxxxxxx> Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 That could work. But then we should also use that flag to suppress the other places where we do a list_del*, so something like this ? diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 50dfd9049370..b324fff33e53 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error) /* Flags */ #define MID_WAIT_CANCELLED 1 /* Cancelled while waiting for response */ #define MID_DELETED 2 /* Mid has been dequeued/deleted */ +#define MID_RECONNECT 4 /* Mid is being used during reconnect */ /* Types of response buffer returned from SendReceive2 */ #define CIFS_NO_BUFFER 0 /* Response buffer not returned */ diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index bdea4b3e8005..b142bd2a3ef5 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server) spin_lock(&GlobalMid_Lock); list_for_each_safe(tmp, tmp2, &server->pending_mid_q) { mid_entry = list_entry(tmp, struct mid_q_entry, qhead); + kref_get(&mid_entry->refcount); + mid_entry->mid_flags |= MID_RECONNECT; if (mid_entry->mid_state == MID_REQUEST_SUBMITTED) mid_entry->mid_state = MID_RETRY_NEEDED; list_move(&mid_entry->qhead, &retry_list); @@ -575,7 +577,9 @@ cifs_reconnect(struct TCP_Server_Info *server) list_for_each_safe(tmp, tmp2, &retry_list) { mid_entry = list_entry(tmp, struct mid_q_entry, qhead); list_del_init(&mid_entry->qhead); + mid_entry->callback(mid_entry); + cifs_mid_q_entry_release(mid_entry); } if (cifs_rdma_enabled(server)) { @@ -895,7 +899,7 @@ dequeue_mid(struct mid_q_entry *mid, bool malformed) if (mid->mid_flags & MID_DELETED) printk_once(KERN_WARNING "trying to dequeue a deleted mid\n"); - else + else if (!(mid->mid_flags & MID_RECONNECT)) list_del_init(&mid->qhead); spin_unlock(&GlobalMid_Lock); } diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index 308ad0f495e1..ba4b5ab9cf35 100644 --- a/fs/cifs/transport.c +++ b/fs/cifs/transport.c @@ -173,7 +173,8 @@ void cifs_delete_mid(struct mid_q_entry *mid) { spin_lock(&GlobalMid_Lock); - list_del_init(&mid->qhead); + if (!(mid->mid_flags & MID_RECONNECT)) + list_del_init(&mid->qhead); mid->mid_flags |= MID_DELETED; spin_unlock(&GlobalMid_Lock); @@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct TCP_Server_Info *server) rc = -EHOSTDOWN; break; default: - list_del_init(&mid->qhead); + if (!(mid->mid_flags & MID_RECONNECT)) + list_del_init(&mid->qhead); cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu state=%d\n", __func__, mid->mid, mid->mid_state); rc = -EIO;