Re: races in ll_splice_alias() and elsewhere (ext4, ocfs2)

"Theodore Ts'o" <tytso@xxxxxxx> · Thu, 10 Mar 2016 18:55:43 -0500

On Thu, Mar 10, 2016 at 02:59:49AM +0000, Al Viro wrote:
> On Thu, Mar 10, 2016 at 02:20:42AM +0000, Al Viro wrote:
> 
> > Umm...  AFAICS, ext4_d_revalidate() is racy, starting with the very
> > first line.  What's to prevent it being moved while we are calling that?
> > Lose timeslice on preemption, have mv(1) move it elsewhere, followed by
> > rmdir taking the now-empty parent out.  Come back and dir points to
> > freed memory, with ci being complete junk.  Looks like oopsen galore...
> > Ted, am I missing something subtle here?
> 
> BTW, the fact that original parent dentry is pinned by caller doesn't help
> at all - by the time we get to ext4_d_revalidate() its ->d_parent might have
> been pointing to something we are *not* pinning, with another rename() +
> rmdir() completing the problem.  It's going to be hard to hit, but not
> impossible.  Have d_move() happen right after we'd found the match in
> __d_lookup(), then get preempted just as we'd fetched (already changed)
> ->d_parent->d_inode in ext4_d_revalidate().  The second rename() + rmdir()
> have to complete by the time we regain CPU and we are screwed.

The fundamental problem is that ChromeOS wanted root to be able to
delete files for which it didn't have the key.  (The use case is you
have multiple users sharing a single ChromeOS laptop, and you want to
reclaim space from another user's cache directory by deleting files,
even though you don't have the key to decrypt the files' contents or
filename.)  So if you don't have the key, the directory entries are
returned in as a (modified) base-64 encoding of the encrypted
directory entry.

The problem is we've been using the keyring infrastructure, which
means that it's possible for a non-privileged uid (for example,
"chronos") to have access to the key, but for another user (perhaps
the root user) to not have access to the key.  And this violates one
of the fundamental assumptions of the dentry cache, which is that
there is only one view of the namespace for all users.  Some users
might not have permission to access part of the namespace --- but if
they do, it looks the same regardless of whether you are root or a
non-root user.

The ext4_d_revalidate() function was an attempt to square the circle,
but yes, I've been coming to the conclusion that it doesn't work all
that well.  One thing I've been considering is to make access to the
keys a global property.  So the first time a user with access to the
key tries to access an encrypted file, we import the key into mounted
file system's ext4_sb_info structure, and we bump a generation
counter, and then ext4_d_revalidate() simply returns "invalid" for all
denrty entries which (a) refer to an encrypted file or directory, and
(b) whose generation number is less than the current generation
number.

Similarly, if we invalidate a key, we remove the key from tthe keyring
hanging off of the mounted file system's sb_info structure, and then
bump the generation number, which will invalidate the dentries for all
encrypted files/directories on that file system.

This is a bit non-optimal, but I don't see any other way of solving
the problem.  Al, do you have any suggestions?

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html