On Mon, Jun 12, 2023 at 03:45:12AM -0500, Eric W. Biederman wrote: > Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > > > On Sun, Jun 11, 2023 at 10:49 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > >> > >> On Sun, Jun 11, 2023 at 10:34:29PM -0700, Linus Torvalds wrote: > >> > > >> > So that "!=" should obviously have been a "==". > >> > >> Same as without the condition - all the fsstress tasks hang in > >> do_coredump(). > > > > Ok, that at least makes sense. Your "it made things worse" made me go > > "What?" until I noticed the stupid backwards test. > > > > I'm not seeing anything else that looks odd in that commit > > f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps > > regression"). > > > > Let's see if somebody else goes "Ahh" when they wake up tomorrow... > > It feels like there have been about half a dozen bugs pointed out in > that version of the patch. I am going to have to sleep before I can get > as far as "Ahh" > > One thing that really stands out for me is. > > if (test_if_loop_should_continue) { > set_current_state(TASK_INTERRUPTIBLE); > schedule(); > } > > /* elsewhere */ > llist_add(...); > wake_up_process() > > So it is possible that the code can sleep indefinitely waiting for a > wake-up that has already come, because the order of set_current_state > and the test are in the wrong order. > > Unfortunately I don't see what would effect a coredump on a process that > does not trigger the vhost_worker code. > > > > About the only thing I can image is if io_uring is involved. Some of > the PF_IO_WORKER code was changed, and the test > "((t->flags & (PF_USER_WORKER | PF_IO_WORKER)) != PF_USER_WORKER)" > was sprinkled around. > > That is the only code outside of vhost specific code that was changed. > > > Is io_uring involved in the cases that hang? Oh, right, I involved io_uring into in fstests' fsstress.c, and I built kernel with CONFIG_IO_URING=y. If Darrick (said he didn't hit this issue) didn't enable io_uring, that might mean it's io_uring related. > > > Eric >