Re: [PATCH] ceph: fix potential use-after-free in ceph_mdsc_build_path

Jeff Layton <jlayton@xxxxxxxxxx> · Fri, 26 Apr 2019 19:16:13 -0400

On Fri, 2019-04-26 at 22:40 +0100, Al Viro wrote:
> On Fri, Apr 26, 2019 at 05:12:38PM -0400, Jeff Layton wrote:
> > temp is not defined outside of the RCU critical section here. Ensure
> > we grab that value before we drop the rcu_read_lock.
> > +	base = ceph_ino(d_inode(temp));
> >  	rcu_read_unlock();
> 
> Umm...  Freeing (including freeing the name) is postponed by holding
> rcu_read_lock().  Children moving away + dentry going negative
> is *not*.
> 

Yep. This patch is just aimed at "let's prevent an oops". Whether the
resulting string makes any sense is another matter entirely.

> What are you trying to return there, anyway?  Root or, in case of
> stop_on_nosnap, CEPH_NOSNAP one you'd stepped into?
> 

Correct, AFAIU.

> The latter I'd suggest to handle while under ->d_lock; the former
> ought to be safe if it's fs root.  Details, please...

I'm still rather new to this code, but...basically stop_on_nonsnap is
always set, other than in two cases:

1/ printing pathnames (generally for debugfs)

2/ legacy protocol version handling

The first we don't really care too much about correctness in the face of
races, only that we have a consistent string of some sort. The second
might matter, but those really old protocol versions are not in wide
usage. Ceph has a lot of legacy cruft.

In any case, most of the calls that involve filenames end up getting a
parent inode number and a single dentry name. For snapshots you'll get a
non-snapped parent inode number (generally the one of the .snap
directory) and full path from there to the dentry. Snapshots are read-
only though so in principle that tree should be relatively static.

> Another fun question is whether you can hit a disconnected subtree
> from open-by-fhandle in process.  That might get uncomfortable,
> since you'd get the tail of actual pathname and the length will
> depend upon the timing.

Before it returns, ceph_mdsc_build_path does this:

if (pos != 0 || read_seqretry(&rename_lock, seq)) {
	kfree(path);
	goto retry;
}

If we end up with a path that has a different length than the one we
originally allocated, we'll do the whole thing over again.

I think the only place that's really a problem is when we're talking to
a legacy protocol server? Maybe one of the ceph maintainers can step in
and clarify this point.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>