On Mon, Jul 08, 2019 at 12:12:21PM -0500, Eric W. Biederman wrote: > Al you do realize that the TOCTOU you are talking about comes the system > call API. TOMOYO can only be faulted for not playing in their own > sandbox and not reaching out and fixing the vfs implementation details. > > Userspace has always had to very careful to only mount filesystems > on paths that root completely controls and won't change. That has nothing whatsoever to do with the path where you are mounting something. _That_ is actually looked up before ->sb_mount() gets called; no TOCTOU there. The thing where ->sb_mount() is fucked by design is its handling of * device name * old tree in mount --bind * old tree in mount --move * things like journal name (not that any of the instances had tried to do anything with that) All of those *do* have TOCTOU, and that's an inevitable result of the idiotic hook fetishism of LSM design. Instead of "we want something to happen when such-and-such predicate is about to change", it's "lemme run my code, the earlier the better, I don't care about any damn predicates, it's all too complicated anyway, whaddya mean racy?" Any time you have pathname resolution done twice, it's a built-in race. If you want *ALL* checks on mount(2) to be done before the mean, nasty kernel code gets to decide anything (bind/move/mount/etc. all squashed together, just let us have at the syscall arguments, mmkay?) - that's precisely what you get. And no, that TOCTOU is not in syscall API. "open() of an untrusted pathname may end up trying to open hell knows what" is one thing; "open() of an untrusted pathname may apply MAC checks to one object and open something entirely different" is another. The former is inherent to syscall API. The latter would be a badly fucked up implementation (we don't have that issue on open(2), thankfully). To make it clear, TOMOYO is not at fault here; LSM "architecture" is. Note, BTW, that TOMOYO checks there do *NOT* limit the input pathname at all - only the destination of the first pathwalk. Repeating it may easily lead to an entirely different place. Canonicalized pathname is derived from pathwalk result; having concluded that it's perfectly fine for the operation requested is pure security theatre - it * says nothing about the trustedness of the original pathname * may have nothing whatsoever to the object yielded by the second pathwalk, which is what'll end up actually used. It's not even "this thing walks through /proc, and thus not to be trusted to be stable" - the checks won't notice where the damn thing had been. When somebody proposes _useful_ MAC for mount --move (and that really can't be done at the level of syscall entry - we need to have already figured out that with given combination of flags the 1st argument of mount(2) will be a pathname *and* already looked it up), sure - it will be added to do_move_mount(), which is where we have all lookups done, and apply both for mount() and move_mount(). Right now anyone relying upon DAC enforced for MS_MOVE has worse problems than "attacker will use move_mount(2) and bypass my policy" - the same attacker can bloody well bypass those with nothing more exotic than clone(2) and dup2(2) (and mount(2), of course). And it's not just MS_MOVE (or MS_BIND). Anyone trying to prevent mounting e.g. ext2 from untrusted device and do that on the level of ->sb_mount() *is* *bloody* *well* *fucked*. ->sb_mount() is simply the wrong place for that.