Re: [PATCH review 08/12] quota: Ensure qids map to the filesystem

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 13 Jul 2016 11:34:36 +1000

On Mon, Jul 11, 2016 at 01:12:49PM -0500, Eric W. Biederman wrote:
> The place where I am concerned about thorough review and testing is
> someone poisoning quota files and then the kernel trying to use them.
> In the preliminary work we have done in other places in the kernel and
> for other filesystems there almost always winds up being some way to
> confuse the kernel and get it to misbave if you can poison the disk
> based inputs.  As poison disk based inputs is not something filesystems
> are stronlgy concerned about.  In most cases the disk the filesystem
> resides on is in the box and therefore under control of the OS at all
> times.  Dave Chinner has even said he will never consider handling
> poisoned disk based inputs for XFS as the run time cost is too high.

I didn't say that. I said that comprehensive checks to catch all
possible malicious inputs is too expensive to consider a viable
solution for allowing user-mounts of arbitrary filesystem images
through the kernel.

We already have runtime validation that bounds check most on-disk
fields when they are read - that deals with fuzz based poisoning
testing quite well and provides some protection against directed
structural attacks as well. IOWs, we already handle a large scope of
poisoned inputs safely, but it's not comprehensive because we can't
easily determine cross-object reference validity within the
format-determined limits.

e.g. we can check that the number of records in a btree block is
within the valid bounds of a block, but we cannot determine that the
record count has been incremented by 1 and a bogus record has been
inserted somewhere inside the block and the CRC recalculated to
match the modification. We can also check the records themselves for
being within bound (e.g. we can check a freespace record points to
block within range and of valid length) but we can't check that the
extent is actually free space. That requires doing a full filesystem
traversal to determine if the extent is actually free space or not.

Of course, we could look up the rmap tree (if we have one) to
determine if the space is actually used or not, but an attacker can
also insert/remove records in that tree, too, so if we can't trust
the free space tree, we can't trust the rmap tree either.
Hence we have to fall back to brute force validation if we
want to be certain that the metadata has not been tampered with.

To bring this back to quota files, the only way to validate that a
quota file has not been tampered with is to run a quotacheck on the
filesystem once it has been mounted. This requires visiting every
inode in the filesystem, so it an expensive operation. Only XFS has
this functionality in kernel, so for untrusted mounts we could
simply run it on every mount that has quotas enabled. Of course,
users won't care that mounting their filesystem now takes several
minutes (hours, even, when we have millions of inodes in the fs)
while these checks are run...

Detecting malicious corruptions that specifically manipulate the
on-disk structure within the bounds of format validity are difficult
to detect and costly to protect against. We'd need to move large
parts of fsck into the kernel and run it to validate every piece of
metadata read into the kernel. Then we've got a much larger attack
surface in the kernel (all the validity checking code needs to be
robust against invalid structures, too!), a lot more complexity
(more bugs!) and a lot of additional runtime overhead (slow
filesystem = unhappy users!). It's just not a practical solution to
the problem.

> Between actually finding issues that can cause problems, and the
> increased amount of kernel code executed (and thus the increase in
> kernel attack surface) I am very paranoid about enabling code that
> trusts data that could be poisoned data from a hostile party.
>
> At the same time I am very uncomfortable with the fact the kernel does
> not protect against malicious disks and poisoned disk images.  As
> poisoned disk images are a well known exploit vector in the wild.  A
> well known and demonstrated attack vector that works is to leave a usb
> stick in a public place, and helpful people will place it into their
> computer to try and figure out who it belongs to.  In trying to be
> helpful their computer will unbeknownst to them start executing code
> that does not serve the interests of the computer owner.  I hate that we
> can not currently protect people from shenanigans like that.

Yes, we know all about these problems. Unfortunately someone
appears to not be listening when they being repeatedly told that
hardening all the kernel filesystem implementations against poisoned
images is simply not a viable solution to the problem.

Move the parsing of untrusted structures out of the kernel - work
with the various filesystem teams to build viable FUSE
implementations (where it's much easier to incorporate parts of the
userspace fsck code) and provide the FUSE filesystems to container
users wanting to mount their own filesystem images.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html