Re: dcache_readdir NULL inode oops

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Fri, 30 Nov 2018 16:08:52 +0000

On Fri, Nov 30, 2018 at 09:16:49AM -0600, Eric W. Biederman wrote:
> >> > +       inode_lock(parent->d_inode);
> >> >         dentry->d_fsdata = NULL;
> >> >         drop_nlink(dentry->d_inode);
> >> >         d_delete(dentry);
> >> > +       inode_unlock(parent->d_inode);
> >> > +
> >> >         dput(dentry);   /* d_alloc_name() in devpts_pty_new() */
> >> >  }
> >
> > This feels right but getting some feedback from others would be good.
> 
> This is going to be special at least because we are not coming through
> the normal unlink path and we are manipulating the dcache.
> 
> This looks plausible.  If this is whats going on then we have had this
> bug for a very long time.  I will see if I can make some time.
> 
> It looks like in the general case everything is serialized by the
> devpts_mutex.  I wonder if just changing the order of operations
> here would be enough.
> 
> AKA: drop_nlink d_delete then dentry->d_fsdata.  Ugh d_fsdata is not
> implicated so that won't help here.

It certainly won't.  The thing is, this
                if (!dir_emit(ctx, next->d_name.name, next->d_name.len,
                              d_inode(next)->i_ino, dt_type(d_inode(next))))
in dcache_readdir() obviously can block, so all we can hold over it is
blocking locks.  Which we do - specifically, ->i_rwsem on our directory.

It's actually worse than missing inode_lock() - consider the effects
of mount --bind /mnt/foo /dev/pts/42.  What happens when that thing
goes away?  Right, a lost mount...

I'll resurrect the "kernel-internal rm -rf done right" series and
post it; devpts is not the only place suffering such problem (binfmt_misc,
etc.)