Re: [PATCH 3/3] ksmbd: fix racy issue from using ->d_parent and ->d_name

Christian Brauner <brauner@xxxxxxxxxx> · Wed, 18 May 2022 16:27:09 +0200

On Tue, May 17, 2022 at 09:21:14PM +0000, Al Viro wrote:
> On Wed, Apr 27, 2022 at 11:32:45AM +0900, Namjae Jeon wrote:
> > Al pointed out that ksmbd has racy issue from using ->d_parent and ->d_name
> > in ksmbd_vfs_unlink and smb2_vfs_rename(). and use new lock_rename_child()
> > to lock stable parent while underlying rename racy.
> > Introduce vfs_path_parent_lookup helper to avoid out of share access and
> > export vfs functions like the following ones to use
> > vfs_path_parent_lookup().
> >  - export __lookup_hash().
> >  - export getname_kernel() and putname().
> > 
> > vfs_path_parent_lookup() is used for parent lookup of destination file
> > using absolute pathname given from FILE_RENAME_INFORMATION request.
> 
> First of all, this is seriously broken:
> 
> > -int ksmbd_vfs_lock_parent(struct user_namespace *user_ns, struct dentry *parent,
> > -			  struct dentry *child)
> > +struct dentry *ksmbd_vfs_lock_parent(struct dentry *child)
> >  {
> > -	struct dentry *dentry;
> > -	int ret = 0;
> > +	struct dentry *parent;
> >  
> > +	parent = dget(child->d_parent);
> >  	inode_lock_nested(d_inode(parent), I_MUTEX_PARENT);
> 
> Do that in parallel with host rename() and you are risking this:
> 
> you: fetch child->d_parent
> 	get preempted away
> another thread: move child elsewhere
> another thread: drop (the last) reference to old parent
> 	memory pressure evicts that dentry and reuses memory
> you: regain the timeslice and bump what you think is parent->d_count.
> 	In reality, it's 4 bytes in completely different data structure.
> 	At that point you already have a memory corruptor.  Worse,
> 	there's a decent chance that subsequent code will revert the
> 	corruption, so it would be hell to debug - you need a race
> 	to reproduce the thing in the first place *and* you need
> 	something else to notice the temporary memory corruption.
> 
> you: fetch what you think is ->d_inode of that dentry.  It actually
> 	isn't anything of that sort.
> you: grab rwsem at hell knows what address (might or might not point
> 	to an rwsem).  Here's another chance to get something
> 	reproducible - e.g. if what you thought was ->d_inode actually
> 	points to unmapped memory, you'll get an oops here.  Won't
> 	be consistent, though.
> 
> > +	if (child->d_parent != parent) {
> you:	->d_parent doesn't point there anymore
> 
> > +		dput(parent);
> 
> you:	decrement those 4 bytes in whatever object it is; if you are
> 	lucky, it won't hit zero and nobody had noticed the temporary
> 	increment.  If you are not, well...
> 
> > +		inode_unlock(d_inode(parent));
> 
> you:	fetch ->d_inode of parent (mind you, it's a bug in its own right -
> 	even if parent hadn't gotten freed before your dget(), after dput()
> 	above it's fair game for getting freed; placing that dput()
> 	before unlocking d_inode() is wrong).  Assuming you've got
> 	the same pointer as the first time around, you proceed to
> 	drop rwsem at the same address where you've grabbed it.
> 
> IOW, you really don't want that in the tree in this form.
> 
> It *might* be partially recoverable if you replace the first dget() with
> dget_parent() and reorder dput() and inode_unlock() in failure case, but...
> some of the callers of that thing are also rather dubious.
> 
> Look: you have smb2_open() calling ksmbd_vfs_may_delete(), which calls
> that thing.  Downstream of this:
> 	if (!file_present) {
> 	...
> 	} else if (!already_permitted) {
> 
> If the parent is *NOT* already locked by that point, just how much is
> your 'file_present' worth?  And if it is, you'd obviously deadlock
> right there and then...
> 
> I'm not sure I like what you've done with added exports - e.g.
> __lookup_hash had been OK as a name of static function, but exporting
> it is asking for clashes.  And honestly, what would you say when running
> into a name like that?  OK, it sounds like it's a (probably low-level)
> lookup in some hash table.  _Maybe_ it would've been fine if we had one
> and only implementation of hash tables in the entire tree and that
> had been a part of it, but it's nothing of that sort.  And "hash" in
> the name is not about doing a hash lookup as opposed to some other
> work (it *does* handle hash misses, allocating dentry, asking filesystem
> to do real on-disk lookup, etc.) - it's actually about "hash function
> of the name is already calculated".  My fault, that - predecessor of
> that thing had been called lookup_one(); it took a string, calculated
> its length and computed hash, then proceeded to do lookups.  The latter
> part could be reused in handling of rmdir et.al., where we already had
> the component length and hash precalculated, so the tail of lookup_one()
> had been carved out into a separate helper.  Circa 2.3.99...
> 
> Anyway, the name is _not_ fit for an export; I'm not sure what to call
> it - lookup_one_qstr(), perhaps?  Additional fun is due to the fact
> that these days it is slightly different from the lookup_one() et.al.
> Those can be called with directory held shared; that allows parallel
> lookups, but it's not free of cost - if we run into a cache miss and need
> to allocate a new dentry and talk to filesystem, we have to recheck the
> hash table after allocation.  __lookup_hash() is called only with parent
> held exclusive and it can skip that fun - hash miss is going to remain
> a miss; nobody else will be able to insert stuff into dcache in that
> directory until we unlock it.
> 
> What I'm worried about is that renaming it to lookup_one_qstr() will
> be an invitation for "oh, we happen to have hash/len already known by the
> time of that lookup_one() call; let's just convert it to lookup_one_qsrt()"
> and if that happens in a place where the parent is held only shared, we'll
> be in trouble.  OTOH, lookup_one_qstr_excl() sounds like an invitation to
> do something painful to whoever's responsible for such name...
> 
> Suggestions, anyone?

Plus, __lookup_hash() doesn't do permission checking so naming it
lookup_one_qstr() seems problematic from that perspective as well...

Don't kill me but:

__excl_qstr_lookup_noperm()

?