пн, 6 апр. 2020 г. в 04:34, Aurelien Aptel <aaptel@xxxxxxxx>: > > Fix tcon use-after-free and NULL ptr deref. > > Customer system crashes with the following kernel log: > > [462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14 => a QUERY DIR > [462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.347107] CIFS VFS: Close unmatched open > [462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > ... > [exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server) > #6 [...] smb2_cancelled_close_fid at ... [cifs] > #7 [...] process_one_work at ... > #8 [...] worker_thread at ... > #9 [...] kthread at ... > > The most likely explanation we have is: > > * When we put the last reference of a tcon (refcount=0), we close the > cached share root handle. > * If closing a handle is interrupted, SMB2_close() will > queue a SMB2_close() in a work thread. > * The queued object keeps a tcon ref so we bump the tcon > refcount, jumping from 0 to 1. > * We reach the end of cifs_put_tcon(), we free the tcon object despite > it now having a refcount of 1. > * The queued work now runs, but the tcon, ses & server was freed in > the meantime resulting in a crash. > > THREAD 1 > ======== > cifs_put_tcon => tcon refcount reach 0 > SMB2_tdis > close_shroot_lease > close_shroot_lease_locked => if cached root has lease && refcount = 0 > smb2_close_cached_fid => if cached root valid > SMB2_close => retry close in a thread if interrupted > smb2_handle_cancelled_close > __smb2_handle_cancelled_close => !! tcon refcount bump 0 => 1 !! > INIT_WORK(&cancelled->work, smb2_cancelled_close_fid); > queue_work(cifsiod_wq, &cancelled->work) => queue work > tconInfoFree(tcon); ==> freed! > cifs_put_smb_ses(ses); ==> freed! > > THREAD 2 (workqueue) > ======== > smb2_cancelled_close_fid > SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon > cifs_put_tcon(cancelled->tcon); => tcon refcount reach 0 second time > *CRASH* > > Fixes: d9191319358d ("CIFS: Close cached root handle only if it has a lease") > Signed-off-by: Aurelien Aptel <aaptel@xxxxxxxx> > Signed-off-by: Steve French <stfrench@xxxxxxxxxxxxx> > --- > changes since v2: > * add WARN_ONCE > * show server hostname & tid > * VFS -> FYI > > fs/cifs/smb2misc.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c > index 0511aaf451d4..cfa57e3ad352 100644 > --- a/fs/cifs/smb2misc.c > +++ b/fs/cifs/smb2misc.c > @@ -766,6 +766,21 @@ smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid, > > cifs_dbg(FYI, "%s: tc_count=%d\n", __func__, tcon->tc_count); > spin_lock(&cifs_tcp_ses_lock); > + if (tcon->tc_count <= 0) { > + const char *host = "(null)"; > + > + WARN_ONCE(tcon->tc_count < 0, "tcon refcount is negative"); > + spin_unlock(&cifs_tcp_ses_lock); > + > + if (tcon->ses && tcon->ses->server > + && tcon->ses->server->hostname) > + host = tcon->ses->server->hostname; > + > + cifs_dbg(FYI, "\\\\%s tid=%u: tcon is closing, skipping async close retry of fid %llu %llu\n", > + host, tcon->tid, persistent_fid, volatile_fid); You can use cifs_server_dbg() directly - it is adding prefix with server name automatically, you just need a server variable in the calling function. > + > + return 0; > + } > tcon->tc_count++; > spin_unlock(&cifs_tcp_ses_lock); > > -- > 2.16.4 > -- Best regards, Pavel Shilovsky