Re: [PATCHES] Bind mount escape fixes (CVE-2015-2925)

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Thu, 01 Oct 2015 22:28:10 -0500

Ben Hutchings <ben@xxxxxxxxxxxxxxx> writes:

> On Thu, 2015-10-01 at 11:15 -0500, Eric W. Biederman wrote:
>> With a strategically placed rename bind mounts can be tricked into
>> giving processes access to the entire filesystem instead of just a piece
>> of it.  This misfeature has existed since bind mounts were introduced
>> into the kernel.  This issue has been fixed in Linus's tree and below
>> are my tested backports of the fixes to 4.2.1, 4.1.8, 3.18.21, 3.14.53,
>> 3.12.48, 3.10.89, 3.4.109, 3.2.71, 2.6.32.68.  All of the kernels 
>> currently listed as being active.
>
> I'm not convinced that this is necessary for the 2.6.32, 3.2 or 3.4
> stable branches.  While it is possible for an administrator to screw
> this up, there is no possibility of a user being able to exploit this
> from a user namespace where they have namespaced-CAP_SYS_ADMIN.

It is cheap and easy to fix.  I made and tested the changes.  So why
not.

Nothing about the bug or the exploit depends on user namespaces, user
namespaces just make it 100% reliable to arrange the necessary
conditions to be able to escape a bind mount.  I don't think anyone
even knew what to look for to allow or prevent this until just recently.
So I hesitate to call it an administrator messed up if no one understood
the issue existed.

>> The fixes backported are:
>> cde93be45a8a90d8c264c776fab63487b5038a65 dcache: Handle escaped paths in prepend_path
>> 397d425dc26da728396e66d392d5dcb8dac30c37 vfs: Test for and handle paths that are unreachable from their mnt_root
>
> For 3.16 I started with:
>
> 70291aecc6aa228c1b3bb36a5f3efdb0af636042 namei: lift (open-coded) terminate_walk() in follow_dotdot_rcu() into callers
>
> which then made the other two trivial to apply.  I think that would
> also work for 3.14 and 3.18.

It probably does fix the issue.  Without applying
70291aecc6aa228c1b3bb36a5f3efdb0af636042 and reading the code of
fs/namei.c I can not tell.  That would probably take me an hour and I am
not volunteering that time right now.  That backport addresses the
issues I can think of off the top of my head with in for 3.14 and 3.18,
but there are a lot of subtle dependencies in fs/namei.c.

What I know is that there were a number of kernels where my patch that
added a return and a return code to follow_dotdot applied cleanly but
did not work correctly.  mountpoint_last was a factor, as was the
movement/consolidation of terminate_walk().

I found it easier to adapt my change to follow_dotdot to not need
the movement of terminate_walk() than to figure out which part of
the cleanup would be needed to remove the need.

For a backport it seemed the better part of valor to make the necessary
changes as small and as locally correct as I could.  As with less code
it is harder to get it wrong.

>> As I backported the patches the logical work remained the same but the
>> exact implemenation details changed to fit in with the vfs present in
>> the older kernels.  Minor changes were needed for every the backport to
>> every kernel except 4.2.1.
>>
>> Please queue these changes for the appropriate stable trees.
>
> For 4.2, I had the idea that this one was needed too:
>
> a03e283bf5c3d4851b4998122196ce9f849e6dfb dcache: Reduce the scope of i_lock in d_splice_alias
>
> but perhaps that is just cleanup/optimisation?

Yes it has no immediate bearing on this issue as it was fixed.

It is halfway to fixing the locking craziness in d_splice_alias and a
more ambitious fix would probably could take advantage of that.  But
the more ambitious fix was not 

Yes that is just a cleanup.  It is also half way to removing the locking
craziness in d_splice_alias.  Which is a related but different battle.

Eric
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html