On 11/11/21 9:55 AM, Jens Axboe wrote: > On 11/11/21 9:19 AM, Jens Axboe wrote: >> On 11/11/21 8:29 AM, Jens Axboe wrote: >>> On 11/11/21 7:58 AM, Jens Axboe wrote: >>>> On 11/11/21 7:30 AM, Jens Axboe wrote: >>>>> On 11/10/21 11:52 PM, Daniel Black wrote: >>>>>>> Would it be possible to turn this into a full reproducer script? >>>>>>> Something that someone that knows nothing about mysqld/mariadb can just >>>>>>> run and have it reproduce. If I install the 10.6 packages from above, >>>>>>> then it doesn't seem to use io_uring or be linked against liburing. >>>>>> >>>>>> Sorry Jens. >>>>>> >>>>>> Hope containers are ok. >>>>> >>>>> Don't think I have a way to run that, don't even know what podman is >>>>> and nor does my distro. I'll google a bit and see if I can get this >>>>> running. >>>>> >>>>> I'm fine building from source and running from there, as long as I >>>>> know what to do. Would that make it any easier? It definitely would >>>>> for me :-) >>>> >>>> The podman approach seemed to work, and I was able to run all three >>>> steps. Didn't see any hangs. I'm going to try again dropping down >>>> the innodb pool size (box only has 32G of RAM). >>>> >>>> The storage can do a lot more than 5k IOPS, I'm going to try ramping >>>> that up. >>>> >>>> Does your reproducer box have multiple NUMA nodes, or is it a single >>>> socket/nod box? >>> >>> Doesn't seem to reproduce for me on current -git. What file system are >>> you using? >> >> I seem to be able to hit it with ext4, guessing it has more cases that >> punt to buffered IO. As I initially suspected, I think this is a race >> with buffered file write hashing. I have a debug patch that just turns >> a regular non-numa box into multi nodes, may or may not be needed be >> needed to hit this, but I definitely can now. Looks like this: >> >> Node7 DUMP >> index=0, nr_w=1, max=128, r=0, f=1, h=0 >> w=ffff8f5e8b8470c0, hashed=1/0, flags=2 >> w=ffff8f5e95a9b8c0, hashed=1/0, flags=2 >> index=1, nr_w=0, max=127877, r=0, f=0, h=0 >> free_list >> worker=ffff8f5eaf2e0540 >> all_list >> worker=ffff8f5eaf2e0540 >> >> where we seed node7 in this case having two work items pending, but the >> worker state is stalled on hash. >> >> The hash logic was rewritten as part of the io-wq worker threads being >> changed for 5.11 iirc, which is why that was my initial suspicion here. >> >> I'll take a look at this and make a test patch. Looks like you are able >> to test self-built kernels, is that correct? > > Can you try with this patch? It's against -git, but it will apply to > 5.15 as well. I think that one covered one potential gap, but I just managed to reproduce a stall even with it. So hang on testing that one, I'll send you something more complete when I have confidence in it. -- Jens Axboe