On 11/11/21 8:29 AM, Jens Axboe wrote: > On 11/11/21 7:58 AM, Jens Axboe wrote: >> On 11/11/21 7:30 AM, Jens Axboe wrote: >>> On 11/10/21 11:52 PM, Daniel Black wrote: >>>>> Would it be possible to turn this into a full reproducer script? >>>>> Something that someone that knows nothing about mysqld/mariadb can just >>>>> run and have it reproduce. If I install the 10.6 packages from above, >>>>> then it doesn't seem to use io_uring or be linked against liburing. >>>> >>>> Sorry Jens. >>>> >>>> Hope containers are ok. >>> >>> Don't think I have a way to run that, don't even know what podman is >>> and nor does my distro. I'll google a bit and see if I can get this >>> running. >>> >>> I'm fine building from source and running from there, as long as I >>> know what to do. Would that make it any easier? It definitely would >>> for me :-) >> >> The podman approach seemed to work, and I was able to run all three >> steps. Didn't see any hangs. I'm going to try again dropping down >> the innodb pool size (box only has 32G of RAM). >> >> The storage can do a lot more than 5k IOPS, I'm going to try ramping >> that up. >> >> Does your reproducer box have multiple NUMA nodes, or is it a single >> socket/nod box? > > Doesn't seem to reproduce for me on current -git. What file system are > you using? I seem to be able to hit it with ext4, guessing it has more cases that punt to buffered IO. As I initially suspected, I think this is a race with buffered file write hashing. I have a debug patch that just turns a regular non-numa box into multi nodes, may or may not be needed be needed to hit this, but I definitely can now. Looks like this: Node7 DUMP index=0, nr_w=1, max=128, r=0, f=1, h=0 w=ffff8f5e8b8470c0, hashed=1/0, flags=2 w=ffff8f5e95a9b8c0, hashed=1/0, flags=2 index=1, nr_w=0, max=127877, r=0, f=0, h=0 free_list worker=ffff8f5eaf2e0540 all_list worker=ffff8f5eaf2e0540 where we seed node7 in this case having two work items pending, but the worker state is stalled on hash. The hash logic was rewritten as part of the io-wq worker threads being changed for 5.11 iirc, which is why that was my initial suspicion here. I'll take a look at this and make a test patch. Looks like you are able to test self-built kernels, is that correct? -- Jens Axboe