On Mon, Jul 27, 2020 at 2:06 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > > Therefore make it simpler to get exec correct by freezing the other > threads at the beginning of exec. This removes an entire class of > races, and makes it tractable to fix some of the long standing > issues with exec. I hate the global state part of the freezer. It's also pointless. We don't want to trigger all the tests that various random driver kernel threads do. I also really don't like how now execve() by any random person will suddenly impact everything that might be doing freezing. It also makes for a possible _huge_ latency regression for execve(), since freezing really has never been a very low-latency operation. Other threads doing IO can now basically block execve() for a long long long time. Finally, I think your patch is fundamentally broken for another reason: it depends on CONFIG_FREEZER, and that isn't even required to be set! So no, this is not at all acceptable in that form. Now, maybe we could _make_ it acceptable, by (a) add a per-process freezer count to avoid the global state for this case (b) make a small subset of the freezing code available for the !CONFIG_FREEZER thing (c) fix this "simple freezer" to not actually force wakeups etc, but catch things in the but honestly, at that point nothing of the "CONFIG_FREEZER" code even really exists any more. It would be more of a "execve_synchronize()" thing, where we'd catch things in the scheduler and/or system call entry/exit or whatever. Also, that makes these kinds of nasty hacks that just make the existign freezer code even harder to figure out: > A new function exec_freeze_threads based upon > kernel/power/process.c:try_to_freeze_tasks is added. To play well > with other uses of the kernel freezer it uses a killable sleep wrapped > with freezer_do_not_count/freezer_count. Ugh. Just _ugly_. And honestly, completely and utterly broken. See above. I understand the wish to re-use existing infrastructure. But the fact is, the FREEZER code is just about the _last_ thing you should want to use. That, and stop_machine(), is just too much of a big hammer. Linus