Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Sun, Jun 11, 2023 at 10:49 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> >> On Sun, Jun 11, 2023 at 10:34:29PM -0700, Linus Torvalds wrote: >> > >> > So that "!=" should obviously have been a "==". >> >> Same as without the condition - all the fsstress tasks hang in >> do_coredump(). > > Ok, that at least makes sense. Your "it made things worse" made me go > "What?" until I noticed the stupid backwards test. > > I'm not seeing anything else that looks odd in that commit > f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps > regression"). > > Let's see if somebody else goes "Ahh" when they wake up tomorrow... It feels like there have been about half a dozen bugs pointed out in that version of the patch. I am going to have to sleep before I can get as far as "Ahh" One thing that really stands out for me is. if (test_if_loop_should_continue) { set_current_state(TASK_INTERRUPTIBLE); schedule(); } /* elsewhere */ llist_add(...); wake_up_process() So it is possible that the code can sleep indefinitely waiting for a wake-up that has already come, because the order of set_current_state and the test are in the wrong order. Unfortunately I don't see what would effect a coredump on a process that does not trigger the vhost_worker code. About the only thing I can image is if io_uring is involved. Some of the PF_IO_WORKER code was changed, and the test "((t->flags & (PF_USER_WORKER | PF_IO_WORKER)) != PF_USER_WORKER)" was sprinkled around. That is the only code outside of vhost specific code that was changed. Is io_uring involved in the cases that hang? Eric