Re: system panic while dentry reference count overflow

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 7 May 2019 08:26:06 -0700

On Mon, May 6, 2019 at 9:15 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> Umm...  Where would you put the cutoff for try_dget()?  1G?  Because
> 2G-<something relatively small> is risky - have it reached, then
> get the rest of the way to 2G by normal dget() and you've got trouble.

I'd make the limit be 2G exactly like the page count. Negative counts
are fine - they work exactly like large integers. It's only 0 that is
special.

So do something like this:

 - make dget() WARN_ONCE(), and perhaps set a flag to start background
dentry pruning, if the dentry count is negative ("big integer") after
the lockref_get()

 - add a try_dget(), which returns the dentry or NULL (and is
"must_check") and just refuses to increment the ref past the 2G mark

 - add the "limit negative dentries" patches that were already written
for other reasons by Waiman Long.

 - and exactly like the page ref count, the negative values can be
tested non-atomically without worrying about races, because it's not a
"hard" limit.  It takes a *looong* time (and a lot of memory) to go
from 2G to actually overflowing

 - for the same "not a hard limit", use try_dget() in a couple of
strategic places that are easy to error out for and that are
particularly easily user-triggerable. It's not clear if this is even
needed, since the only obviously user-triggerable case is the negative
dentry one - everything else really needs an actual user ref, and the
soft "start to try to prune if any dentry ref goes negative" will take
care of the "we just have a ton of unused but cached dentries case.

All pretty much exactly like the page count.

The fact that we have that "slop" of 2 _billion_ references between
"oh, the recount went negative" and "oops, now we overflowed and that
would be fatal" really means that we have a lot of time and
flexibility to handle things. If an attacker has to open two billion
files, the attacker is going to spend a lot of time that we can
mitigate.

               Linus