ср, 1 апр. 2020 г. в 05:50, Aurelien Aptel <aaptel@xxxxxxxx>: > > Fix tcon use-after-free and NULL ptr deref. > > Customer system crashes with the following kernel log: > > [462233.169868] CIFS VFS: Cancelling wait for mid 4894753 cmd: 14 => a QUERY DIR > [462233.228045] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.305922] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.306205] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.347060] CIFS VFS: cifs_put_smb_ses: Session Logoff failure rc=-4 > [462233.347107] CIFS VFS: Close unmatched open > [462233.347113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 > ... > [exception RIP: cifs_put_tcon+0xa0] (this is doing tcon->ses->server) > #6 [...] smb2_cancelled_close_fid at ... [cifs] > #7 [...] process_one_work at ... > #8 [...] worker_thread at ... > #9 [...] kthread at ... > > The most likely explanation we have is: > > * When we put the last reference of a tcon (refcount=0), we close the > cached share root handle. > * If closing a handle is interupted, SMB2_close() will > queue a SMB2_close() in a work thread. > * The queued object keeps a tcon ref so we bump the tcon > refcount, jumping from 0 to 1. > * We reach the end of cifs_put_tcon(), we free the tcon object despite > it now having a refcount of 1. > * The queued work now runs, but the tcon, ses & server was freed in > the meantime resulting in a crash. > > THREAD 1 > ======== > cifs_put_tcon => tcon refcount reach 0 > SMB2_tdis > close_shroot_lease > close_shroot_lease_locked => if cached root has lease && refcount reach 0 > smb2_close_cached_fid => if cached root valid > SMB2_close => retry close in a worker thread if interrupted > smb2_handle_cancelled_close > __smb2_handle_cancelled_close => !! tcon refcount bump 0 => 1 !! > INIT_WORK(&cancelled->work, smb2_cancelled_close_fid); > queue_work(cifsiod_wq, &cancelled->work) => queue work > tconInfoFree(tcon); ==> freed! > cifs_put_smb_ses(ses); ==> freed! > > THREAD 2 (workqueue) > ======== > smb2_cancelled_close_fid > SMB2_close(0, cancelled->tcon, ...); => use-after-free of tcon > cifs_put_tcon(cancelled->tcon); => tcon refcount reach 0 second time > *CRASH* > > Fixes: d9191319358d ("CIFS: Close cached root handle only if it has a lease") > Signed-off-by: Aurelien Aptel <aaptel@xxxxxxxx> > --- > fs/cifs/smb2misc.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/fs/cifs/smb2misc.c b/fs/cifs/smb2misc.c > index 0511aaf451d4..965276aeffcf 100644 > --- a/fs/cifs/smb2misc.c > +++ b/fs/cifs/smb2misc.c > @@ -766,6 +766,12 @@ smb2_handle_cancelled_close(struct cifs_tcon *tcon, __u64 persistent_fid, > > cifs_dbg(FYI, "%s: tc_count=%d\n", __func__, tcon->tc_count); > spin_lock(&cifs_tcp_ses_lock); > + if (tcon->tc_count <= 0) { > + spin_unlock(&cifs_tcp_ses_lock); > + cifs_dbg(VFS, "tcon is closing, skipping async close retry of fid %llu %llu\n", Thanks for a good catch! Some suggestions below: - Why VFS? Share is being disconnected anyway, so all associated handles should be closed by the server eventually. I think FYI is sufficient here. - cifs_dbg_server may be better than cifs_dbg and please include tcon ID here otherwise FIDs may be meaningless if there are several shares mounted on a host. - May be WARN_ONCE if tc_count is negative since it is a possible bug? Reviewed-by: Pavel Shilovsky <pshilov@xxxxxxxxxxxxx> -- Best regards, Pavel Shilovsky