On Tue, Jul 21, 2015 at 01:37:21PM -0400, J. Bruce Fields wrote: > On Fri, Jul 17, 2015 at 12:47:35PM +1000, Dave Chinner wrote: > > On Thu, Jul 16, 2015 at 07:42:03PM -0500, Eric W. Biederman wrote: > > > Dave Chinner <david@xxxxxxxxxxxxx> writes: > > > > The key difference is that desktops only do this when you physically > > > > plug in a device. With unprivileged mounts, a hostile attacker > > > > doesn't need physical access to the machine to exploit lurking > > > > kernel filesystem bugs. i.e. they can just use loopback mounts, and > > > > they can keep mounting corrupted images until they find something > > > > that works. > > > > > > Yep. That magnifies the problem quite a bit. > > > > > > > User namespaces are supposed to provide trust separation. The > > > > kernel filesystems simply aren't hardened against unprivileged > > > > attacks from below - there is a trust relationship between root and > > > > the filesystem in that they are the only things that can write to > > > > the disk. Mounts from within a userns destroys this relationship as > > > > the userns root, by definition, is not a trusted actor. > > > > > > I talked to Ted Tso a while back and ext4 is at least in principle > > > already hardened against that kind of attack. I am not certain I > > > believe it, but if it is true I think it is fantastic. > > > > No, it's not. No filesystem is, because to harden against such > > attacks requires complete verification of all metadata when it is > > read from disk, before it is used, or some method or ensuring the > > block was not tampered with. CRCs are not sufficient, because they > > can be tampered with, too. > > > > The only way a filesystem would be able to trust what it reads from > > disk has not been tampered with in a system with untrusted mounts is > > if it has some kind of cryptographically secure signature in the > > metadata and the attacker is unable to access the key for that > > signature. > > Preventing tampering is a little different from protecting the kernel > from attack, isn't it? I thought the latter was what people were asking > about. People might be asking for the latter, but the only attack vector that can be made against filesystems from below is via tampering with the on-disk structure. An untrusted user in an untrusted container can construct arbitrary untrusted filesystem structures and get them parsed by a context running as $DIETY that assumes the structure is from a trusted source. What can possibly go wrong? IOWs, To protect the kernel against attack from untrusted filesystem images, we either have to be able to guarantee the image can not be modified by untrusted parties (i.e. needs to be created with signed tools, contain only signed filesystem metadata and signed/encrypted data), or we have to sandbox the filesystem parsing code completely (i.e. fuse). > So, for example, a screwed up on-disk directory structure shouldn't > result in creating a cycle in the dcache and then deadlocking. Therein lies the problem: how do you detect such structural defects without doing a full structure validation? e.g. cyclic links may only manifest when completely unrelated pieces of metadata are linked together in a specific way. Further, the problem is not restricted to validation at mount time - if the user can write to the filesystem image file, then they can modify it after it has been mounted, too. That means the attacker may be someone who has broken into a container, not necessarily the user you trusted with unprivileged mounts. That means every cold metadata read needs to be treated with suspicion, not just at mount time. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html