On Tue, 11 Jul 2023, Kent Overstreet wrote: > On Tue, Jul 11, 2023 at 04:44:39PM -0700, Darrick J. Wong wrote: > > On Tue, Jul 11, 2023 at 05:51:42PM +0200, Mikulas Patocka wrote: > > > When I run the test 558 on bcachefs, it works like a fork-bomb and kills > > > the machine. The reason is that the "while" loop spawns "create_file" > > > subprocesses faster than they are able to complete. > > > > > > This patch fixes the crash by limiting the number of subprocesses to 128. > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > > > > > --- > > > tests/generic/558 | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > Index: xfstests-dev/tests/generic/558 > > > =================================================================== > > > --- xfstests-dev.orig/tests/generic/558 > > > +++ xfstests-dev/tests/generic/558 > > > @@ -48,6 +48,7 @@ echo "Create $((loop * file_per_dir)) fi > > > while [ $i -lt $loop ]; do > > > create_file $SCRATCH_MNT/testdir $file_per_dir $i >>$seqres.full 2>&1 & > > > let i=$i+1 > > > + if [ $((i % 128)) = 0 ]; then wait; fi > > > > Hm. $loop is (roughly) the number of free inodes divided by 1000. This > > test completes nearly instantly on XFS; how many free inodes does > > bcachefs report after _scratch_mount? > > > > XFS reports ~570k inodes, so it's "only" starting 570 processes. > > > > I think it's probably wise to clamp $loop to something sane, but let's > > get to the bottom of how the math went wrong and we got a forkbomb. > > It's because: > - bcachefs doesn't even report a maximum number of inodes (IIRC); > inodes are small and variable size (most fields are varints, typical > inode size is 50-100 bytes). > > - and the kernel has a sysctl to limit the maximum number of open > files, and it's got a sane default; this is what's supposed to save > us from pinned inodes eating up all ram), but systemd conveniently > overwrites it to some absurd value... > > I'd prefer to see this fixed properly, rather than just "fixing" the > test; userspace being able to pin all kernel memory this easily is a > real bug. > > We could put a second hard cap on the maximum number of open files, and > base that on a percentage of total memory; a VFS inode is somewhere in > the ballpack of a kilobyte, so easy enough to calculate. And we could > make that percentage itself a sysctl, for the people who are really > crazy... If we hit the limit of total open files, we already killed the system. At this point the user can't execute any program because executing a programs requires opening files. I think that it is possible to setup cgroups so that a process inside a cgroup can't kill the machine by exhausting resources. But distributions don't do it. And they don't do it for a root user (the test runs under root). Mikulas