Re: Report bug to Linux ext4 file system about inode

"Theodore Ts'o" <tytso@xxxxxxx> · Sun, 5 Sep 2021 10:36:31 -0400

On Sun, Sep 05, 2021 at 09:29:45PM +0800, 肖杰韬 wrote:
> Hi, our team has found a problem in ext4 system on Linux kernel
> v5.10, leading to DoS attacks.
> 
> The struct inode can be exhausted by normal users by calling syscall
> such as creat. A normal user can repeatedly make the creat syscalls
> to creat files and exhaust all struct inode. As a result，although
> there is still a lot of space in the disk, there are no available
> inodes and all ext4 files/directories creation of all other users
> will fail.

You can use project quotas to control the number of blocks and inodes
that are used under a particular directory hierarchy.  So if a
particular container is chroot'ed to the top-level of the directory
using project quota, you can control the amount of file system
resources used by that container.

Indeed, project quotas were added to ext4 specifically to address the
issue of different containers sharing a file system potentially using
all of the blocks or inodes in that shared file system.  (See more
below for a discussion of the on-going effort to add various point
solutions for the sake of containers.)  If you are not using
containers, normal user and group quotas would be the appropriate
solution.

If you are referring to memory utilization (which is normally what
people refer to when they use terms like "struct inode") it
appropriate solution is the memcg controller to limit how much memory
can be used by a particular container.

These techniques are applicable to any file system, and the issue you
raised is not specific to the ext4 file system.  The real issue is the
mistaken belief that containers provide perfect (or some would say,
even adequate) isolation between mutually suspicious users --- and
they do not.

There are people who are trying to sell the benefits of containers who
will try to make this claim, and the obvious issues such as the one
you have identified, have point solutions.  However, if you are really
concerned about providing iron-clad isolation between two users such
that if one of them is malicious, they can not affect the other, the
much better solution is to use Virtual Machines.  VM's are not as
efficient, of course, but that is the nature of engineering tradeoffs.

That being said, people who are developing containers do work to patch
up each isolation failure as they come up, but people need to
understand that there is a certain amount of whack-a-mole[1] that is
happening.  This continuing effort is because of the clear efficiency
gains of containers vs VM's.  But there is a reason why cloud products
such as Google Kubernetes Engine use VM's *plus* containers such that
each GCP Project has exclusive use of a particular VM.  This avoids
the problems where two mutually suspicious customers, such as for
example, Qingju Bike and Meituan Bike, or Hauwei and Samsung, would be
in a position to try to breach the isolation of a pure container-based
system, and cause problems for their competitor(s).

[1] https://en.wikipedia.org/wiki/Whac-A-Mole#Colloquial_usage

Cheers,

					- Ted