Hi Al, On Tue, Oct 2, 2018 at 5:58 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Tue, Oct 02, 2018 at 01:18:30PM +0200, Ondrej Mosnacek wrote: > > No. With the side of Hell, No. The bug is real, but this is > not the way to fix it. > > First of all, it's still broken - e.g. mount something on a > subdirectory and watch what that thing will do to it. And > anyone who has permission to reload policy _will_ have one > for mount. I have no doubts there are still tons of bugs left over, but having processes traverse selinuxfs while load_policy is in progress is something that can (and will) easily happen in practice. Mounting over an selinuxfs subdirectory OTOH isn't something that you would normally do. I think it is worth doing a quick but partial fix that fixes a practical problem and then working towards a better long-term solution. I feel like you are assuming that I am trying to fix some security problem here, but that's not true. It *may* be seen as a security issue, but as you point out having permission to load policy will usually imply you can do other kinds of damage anyway. I simply see this as an annoying real-life bug (I know of at least one user that is/was affected) that I want to fix (even if not in a perfect way for now). > > And yes, debugfs_remove() suffers the same problem. Frankly, the > only wish debugfs_remove() implementation inspires is to shoot it > before it breeds. Which is the second problem with that approach - > such stuff really shouldn't be duplicated in individual filesystems. > > Don't copy (let along open-code) d_walk() in there. The guts of > dcache tree handling are subtle enough (and had more than enough > locking bugs over the years) to make spreading the dependencies > on its details all over the tree an invitation for massive PITA > down the road. Right, I am now convinced that it is not a good idea at all. The thought of adding a new function to dcache has crossed my mind, but I dismissed it as I didn't want to needlessly touch other parts of the kernel. Looking back now, it would have been a better choice indeed. > > I have beginnings of patch doing that for debugfs; the same thing > should be usable for selinuxfs as well. However, the main problem > with selinuxfs wrt policy loading is different - what to do if > sel_make_policy_nodes() fails halfway through? And what to do > with accesses to the unholy mix of old and new nodes we currently > have left behind? Again, this a big problem, too, but out of the scope I care about right now. > > Before security_load_policy() we don't know which nodes to create; > after it we have nothing to fall back onto. It looks like we > need to split security_load_policy() into "load", "switch over" and > "free old" parts, so that the whole thing would look like > load > create new directory trees, unconnected to the root so > they couldn't be reached by lookups until we are done > switch over > move the new directory trees in place > kill the old trees (using that invalidate+genocide carefully > combination) > free old data structures > with failures upon the first two steps handled by killing whatever detached > trees we'd created (if any) and freeing the new data structures. > > However, I'd really like to have the folks familiar with selinux guts to > comment upon the feasibility of the above. AFAICS, nobody has ever seriously > looked at that code wrt graceful error handling, etc.[*], so I'm not happy > with making inferences from what the existing code is doing. Yes, that sounds like it could work. I'd be willing to work on that as a longer term solution. Let's hope we get some feedback from them. > > If you are interested in getting selinuxfs into sane shape, that would > be a good place to start. As for the kernel-side rm -rf (which is what > debugfs_remove() et.al. are trying to be)... > * it absolutely needs to be in fs/*.c - either dcache or libfs. > It's too closely tied to dcache guts to do otherwise. > * as the first step it needs to do d_invalidate(), to get rid of > anything that might be mounted on it and to prevent new mounts from appearing. > It's rather tempting to unhash everything in the victim tree at the same > time, but that needs to be done with care - I'm still not entirely happy > with the solution I've got in that area. Alternative is to unhash them > on the way out of subtree. simple_unlink()/simple_rmdir() are wrong > there - we don't want to bother with the parent's timestamps as we go, > for one thing; that should be done only once to parent of the root of > that subtree. For another, we bloody well enforce the emptiness ourselves, > so this simple_empty() is pointless (especially since we have no choice other > than ignoring it anyway). Right, I was suspicious about the simple_*() functions anyway, I simply wanted to stay close to what the old code and debugfs were doing (turns out they were both wrong anyway). So, it sounds like you are already working on a better fix... Should I wait for your patch(es) or should I try again? There is no rush, I just want to know if it makes sense for me to still work on it :) > > BTW, another selinuxfs unpleasantness is, the things like sel_write_enforce() > don't have any exclusion against themselves, let alone the policy reloads. > And quite a few of them obviously expect that e.g. permission check is done > against the same policy the operation will apply to, not the previous one. > That one definitely needs selinux folks involved. > > [*] not too unreasonably so - anyone who gets to use _that_ as an attack > vector has already won, so it's not a security problem pretty much by > definition and running into heavy OOM at the time of policy reload is > almost certainly going to end up with the userland parts of the entire > thing not handling failures gracefully. Thanks a lot for your comments! -- Ondrej Mosnacek <omosnace at redhat dot com> Associate Software Engineer, Security Technologies Red Hat, Inc.