Ben Hutchings <ben@xxxxxxxxxxxxxxx> writes: > On Thu, 2015-10-01 at 11:15 -0500, Eric W. Biederman wrote: >> With a strategically placed rename bind mounts can be tricked into >> giving processes access to the entire filesystem instead of just a piece >> of it. This misfeature has existed since bind mounts were introduced >> into the kernel. This issue has been fixed in Linus's tree and below >> are my tested backports of the fixes to 4.2.1, 4.1.8, 3.18.21, 3.14.53, >> 3.12.48, 3.10.89, 3.4.109, 3.2.71, 2.6.32.68. All of the kernels >> currently listed as being active. > > I'm not convinced that this is necessary for the 2.6.32, 3.2 or 3.4 > stable branches. While it is possible for an administrator to screw > this up, there is no possibility of a user being able to exploit this > from a user namespace where they have namespaced-CAP_SYS_ADMIN. It is cheap and easy to fix. I made and tested the changes. So why not. Nothing about the bug or the exploit depends on user namespaces, user namespaces just make it 100% reliable to arrange the necessary conditions to be able to escape a bind mount. I don't think anyone even knew what to look for to allow or prevent this until just recently. So I hesitate to call it an administrator messed up if no one understood the issue existed. >> The fixes backported are: >> cde93be45a8a90d8c264c776fab63487b5038a65 dcache: Handle escaped paths in prepend_path >> 397d425dc26da728396e66d392d5dcb8dac30c37 vfs: Test for and handle paths that are unreachable from their mnt_root > > For 3.16 I started with: > > 70291aecc6aa228c1b3bb36a5f3efdb0af636042 namei: lift (open-coded) terminate_walk() in follow_dotdot_rcu() into callers > > which then made the other two trivial to apply. I think that would > also work for 3.14 and 3.18. It probably does fix the issue. Without applying 70291aecc6aa228c1b3bb36a5f3efdb0af636042 and reading the code of fs/namei.c I can not tell. That would probably take me an hour and I am not volunteering that time right now. That backport addresses the issues I can think of off the top of my head with in for 3.14 and 3.18, but there are a lot of subtle dependencies in fs/namei.c. What I know is that there were a number of kernels where my patch that added a return and a return code to follow_dotdot applied cleanly but did not work correctly. mountpoint_last was a factor, as was the movement/consolidation of terminate_walk(). I found it easier to adapt my change to follow_dotdot to not need the movement of terminate_walk() than to figure out which part of the cleanup would be needed to remove the need. For a backport it seemed the better part of valor to make the necessary changes as small and as locally correct as I could. As with less code it is harder to get it wrong. >> As I backported the patches the logical work remained the same but the >> exact implemenation details changed to fit in with the vfs present in >> the older kernels. Minor changes were needed for every the backport to >> every kernel except 4.2.1. >> >> Please queue these changes for the appropriate stable trees. > > For 4.2, I had the idea that this one was needed too: > > a03e283bf5c3d4851b4998122196ce9f849e6dfb dcache: Reduce the scope of i_lock in d_splice_alias > > but perhaps that is just cleanup/optimisation? Yes it has no immediate bearing on this issue as it was fixed. It is halfway to fixing the locking craziness in d_splice_alias and a more ambitious fix would probably could take advantage of that. But the more ambitious fix was not Yes that is just a cleanup. It is also half way to removing the locking craziness in d_splice_alias. Which is a related but different battle. Eric -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html