Paul Aurich <paul@xxxxxxxxxxxxxx> writes: > On 2024-11-21 23:05:51 -0300, Paulo Alcantara wrote: >> >>Why does it need to be a global workqueue? Can't you make it per tcon? > > The problem with a per-tcon workqueue is I didn't see clean way to deal with > multiuser mounts and flushing the workqueue in close_all_cached_dirs() -- when > dealing with each individual tcon, we're still holding tlink_tree_lock, so an > arbitrary sleep seems problematic. OK. > There could be a per-sb workqueue (stored in cifs_sb or the master tcon) but > is there a way to get back to the superblock / master tcon with just a tcon > (e.g. cached_dir_lease_break, when processing a lease break)? Yes - cifs_get_dfs_tcon_super() does that. >>> The final cleanup work for cleaning up a cfid is performed via work >>> queued in the serverclose_wq workqueue; this is done separate from >>> dropping the dentries so that close_all_cached_dirs() doesn't block on >>> any server operations. >>> >>> Both of these queued works expect to invoked with a cfid reference and >>> a tcon reference to avoid those objects from being freed while the work >>> is ongoing. >> >>Why do you need to take a tcon reference? > > In the existing code (and my patch, without the refs), I was seeing an > intermittent use-after-free of the tcon or cached_fids struct by queued work > processing a lease break -- the cfid isn't linked from cached_fids, but > smb2_close_cached_fid invoking SMB2_close can race with the unmount and > cifs_put_tcon > > Something like: > > t1 t2 > cached_dir_lease_break > smb2_cached_lease_break > smb2_close_cached_fid > SMB2_close starts > cifs_kill_sb > cifs_umount > cifs_put_link > cifs_put_tcon > SMB2_close continues Makes sense. > I had a version of the patch that kept the 'in flight lease breaks' on > a second list in cached_fids so that they could be cancelled synchronously > from free_cached_fids(), but I struggled with it (I can't remember exactly, > but I think I was struggling to get the linked list membership / removal > handling and num_entries handling consistent). No worries. The damn thing isn't trivial to follow. >> Can't you drop the dentries >>when tearing down tcon in cifs_put_tcon()? No concurrent mounts would >>be able to access or free it. > > The dentries being dropped must occur before kill_anon_super(), as that's > where the 'Dentry still in use' check is. All the tcons are put in > cifs_umount(), which occurs after: > > kill_anon_super(sb); > cifs_umount(cifs_sb); Right. Can't we call cancel_work_sync() to make sure that any leases breakes are processed on the cached directory handle before calling the above? > The other thing is that cifs_umount_begin() has this comment, which made me > think a tcon can actually be tied to two distinct mount points: > > if ((tcon->tc_count > 1) || (tcon->status == TID_EXITING)) { > /* we have other mounts to same share or we have > already tried to umount this and woken up > all waiting network requests, nothing to do */ > > Although, as I'm thinking about it again, I think I've misunderstood (and that > comment is wrong?). Comment is correct as a single tcon may be shared among different mounts. Consider the following where a single tcon is shared: mount.cifs //srv/share /mnt/1 -o $opts mount.cifs //srv/share/dir /mnt/2 -o $opts There will be two different superblocks that end up using same tcon. > It did cross my mind to pull some of the work out of cifs_umount into > cifs_kill_sb (specifically, I wanted to cancel prune_tlinks earlier) -- no > prune_tlinks would make it more feasible to drop tlink_tree_lock in > close_all_cached_dirs(), at which point a per-tcon workqueue is more > practical. Yeah, multiuser tcons just make it even more complicated. Sorry for the delay as I've been quite busy with other stuff. Great work, BTW.