On Mon, May 06, 2019 at 06:50:27PM -0700, Linus Torvalds wrote: > On Mon, May 6, 2019 at 5:40 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > Linus, lockref is your code, IIRC; which variant would you consider > > more feasible? > > I think we should strive for the same kind of count overflow handling > that the pageref patches did: keep the count at 32 bits, but just add > a new "try_dget()" thing that returns "no" when the count grows too > large. > > And then use the "try_dget()" model when possible, and particularly > for the easy cases for user mode to trigger. You don't have to catch > them all, and in most places it isn't worth even worrying about it > because users can't force billions of those places to be active at > once. Umm... Where would you put the cutoff for try_dget()? 1G? Because 2G-<something relatively small> is risky - have it reached, then get the rest of the way to 2G by normal dget() and you've got trouble. The part that worries me is that we'd get new failure exits that will be impossible to test without fuckloads of RAM in the testbox; certainly more than I have ;-/ > I don't see the original email (I'm not on fsdevel, and google doesn't > find it), so I don't see if there was some particular case that was > pointed out as being an easy attack vector. Attack is predicated upon having 2Tb RAM; basically, a forkbomb with plenty of access(2) on different inexistent names in the same directory from each process. And yes, that can be mitigated by limiting the number of negative dentries, but I'm pretty sure that it can be done with positive ones. E.g. take tmpfs, create a file there, then link(2) a lot and you've got each child contribute to parent directory's dentry refcount. Or you can open the root directory a lot (again, on tmpfs), then seek in it (each open and each cursor will contribute). That way you need 1G opened files, and /proc/sys/fs/file-nr set that high is not impossible if you have enough RAM. Will any boxen be set up that way? No idea... Most of attack vectors going through ->d_parent contributions are relatively easy to deal with - d_alloc() can fail already, and having that kind of overflow mapped at -ENOMEM is not a big deal. Ditto for d_alloc_cursor(). Callers of __d_move() in d_splice_alias() are not hard to deal with. However, d_move() itself is nasty. We call it in situations where it's too late to fail. We _might_ get away with it if we do a variant that consumes a reference to new parent and have that code go try to reserve an extra reference to new parent sod off if failed do the stuff that might fail if failed - drop that extra reference otherwise do the new variant of d_move() That's an extra lock/unlock of parent's ->d_lock, but I'd expect it to drown in the noise on those paths... However, that obviously doesn't do anything for references held by opened files. Having complete_walk() check for refcount being too high and buggering off in that case probably would help that one... Ugh...