----- Original Message ----- > From: "Pavel Shilovskiy" <pshilov@xxxxxxxxxxxxx> > To: "Ronnie Sahlberg" <lsahlber@xxxxxxxxxx> > Cc: "David Wysochanski" <dwysocha@xxxxxxxxxx>, "linux-cifs" <linux-cifs@xxxxxxxxxxxxxxx>, "Frank Sorenson" > <sorenson@xxxxxxxxxx>, "Steven French" <Steven.French@xxxxxxxxxxxxx> > Sent: Friday, 18 October, 2019 9:20:51 AM > Subject: RE: list_del corruption while iterating retry_list in cifs_reconnect still seen on 5.4-rc3 > > Agree. > > Probably the change in dequeue_mid() is not needed but won't hurt at least - > right now dequeue_mid() is being called from the demultiplex thread only, so > as cifs_reconnect(). I am wondering how your patch behaves with the repro. Thanks. Dave, can you test with your reproducer if this makes things better? > > In general, I am starting to think more that we should probably remove a MID > immediately from the pending list once we parse MessageId from the response > and find the entry in the list. Especially with the recent parallel > decryption capability that Steve is working on, we would need to break the > above assumption and process the mid entry in another thread. There are some > cases where we don't end up removing the MID but for those cases we may > simply add the entry back. Anyway, it needs much more thinking and out of > the scope of the bugfix being discussed. I agree. Btw, I think we have had bug sin this code since at least the 3.x kernels so we need a simple fix that is easy to backport to stable and beyond. > > -- > Best regards, > Pavel Shilovsky > > -----Original Message----- > From: Ronnie Sahlberg <lsahlber@xxxxxxxxxx> > Sent: Thursday, October 17, 2019 3:54 PM > To: Pavel Shilovskiy <pshilov@xxxxxxxxxxxxx> > Cc: David Wysochanski <dwysocha@xxxxxxxxxx>; linux-cifs > <linux-cifs@xxxxxxxxxxxxxxx>; Frank Sorenson <sorenson@xxxxxxxxxx> > Subject: Re: list_del corruption while iterating retry_list in cifs_reconnect > still seen on 5.4-rc3 > > That could work. But then we should also use that flag to suppress the other > places where we do a list_del*, so something like this ? > > diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index > 50dfd9049370..b324fff33e53 100644 > --- a/fs/cifs/cifsglob.h > +++ b/fs/cifs/cifsglob.h > @@ -1702,6 +1702,7 @@ static inline bool is_retryable_error(int error) > /* Flags */ > #define MID_WAIT_CANCELLED 1 /* Cancelled while waiting for response */ > #define MID_DELETED 2 /* Mid has been dequeued/deleted */ > +#define MID_RECONNECT 4 /* Mid is being used during reconnect */ > > /* Types of response buffer returned from SendReceive2 */ > #define CIFS_NO_BUFFER 0 /* Response buffer not returned */ > diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index > bdea4b3e8005..b142bd2a3ef5 100644 > --- a/fs/cifs/connect.c > +++ b/fs/cifs/connect.c > @@ -564,6 +564,8 @@ cifs_reconnect(struct TCP_Server_Info *server) > spin_lock(&GlobalMid_Lock); > list_for_each_safe(tmp, tmp2, &server->pending_mid_q) { > mid_entry = list_entry(tmp, struct mid_q_entry, qhead); > + kref_get(&mid_entry->refcount); > + mid_entry->mid_flags |= MID_RECONNECT; > if (mid_entry->mid_state == MID_REQUEST_SUBMITTED) > mid_entry->mid_state = MID_RETRY_NEEDED; > list_move(&mid_entry->qhead, &retry_list); @@ -575,7 +577,9 > @@ cifs_reconnect(struct TCP_Server_Info *server) > list_for_each_safe(tmp, tmp2, &retry_list) { > mid_entry = list_entry(tmp, struct mid_q_entry, qhead); > list_del_init(&mid_entry->qhead); > + > mid_entry->callback(mid_entry); > + cifs_mid_q_entry_release(mid_entry); > } > > if (cifs_rdma_enabled(server)) { @@ -895,7 +899,7 @@ > dequeue_mid(struct mid_q_entry *mid, bool malformed) > if (mid->mid_flags & MID_DELETED) > printk_once(KERN_WARNING > "trying to dequeue a deleted mid\n"); > - else > + else if (!(mid->mid_flags & MID_RECONNECT)) > list_del_init(&mid->qhead); > spin_unlock(&GlobalMid_Lock); > } > diff --git a/fs/cifs/transport.c b/fs/cifs/transport.c index > 308ad0f495e1..ba4b5ab9cf35 100644 > --- a/fs/cifs/transport.c > +++ b/fs/cifs/transport.c > @@ -173,7 +173,8 @@ void > cifs_delete_mid(struct mid_q_entry *mid) { > spin_lock(&GlobalMid_Lock); > - list_del_init(&mid->qhead); > + if (!(mid->mid_flags & MID_RECONNECT)) > + list_del_init(&mid->qhead); > mid->mid_flags |= MID_DELETED; > spin_unlock(&GlobalMid_Lock); > > @@ -872,7 +873,8 @@ cifs_sync_mid_result(struct mid_q_entry *mid, struct > TCP_Server_Info *server) > rc = -EHOSTDOWN; > break; > default: > - list_del_init(&mid->qhead); > + if (!(mid->mid_flags & MID_RECONNECT)) > + list_del_init(&mid->qhead); > cifs_server_dbg(VFS, "%s: invalid mid state mid=%llu > state=%d\n", > __func__, mid->mid, mid->mid_state); > rc = -EIO; >