Daniel, On Fri, Nov 12, 2021 at 05:25:31PM +1100, Daniel Black wrote: > On Fri, Nov 12, 2021 at 10:44 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > > > > On 11/11/21 10:28 AM, Jens Axboe wrote: > > > On 11/11/21 9:55 AM, Jens Axboe wrote: > > >> On 11/11/21 9:19 AM, Jens Axboe wrote: > > >>> On 11/11/21 8:29 AM, Jens Axboe wrote: > > >>>> On 11/11/21 7:58 AM, Jens Axboe wrote: > > >>>>> On 11/11/21 7:30 AM, Jens Axboe wrote: > > >>>>>> On 11/10/21 11:52 PM, Daniel Black wrote: > > >>>>>>>> Would it be possible to turn this into a full reproducer script? > > >>>>>>>> Something that someone that knows nothing about mysqld/mariadb can just > > >>>>>>>> run and have it reproduce. If I install the 10.6 packages from above, > > >>>>>>>> then it doesn't seem to use io_uring or be linked against liburing. > > >>>>>>> > > >>>>>>> Sorry Jens. > > >>>>>>> > > >>>>>>> Hope containers are ok. > > >>>>>> > > >>>>>> Don't think I have a way to run that, don't even know what podman is > > >>>>>> and nor does my distro. I'll google a bit and see if I can get this > > >>>>>> running. > > >>>>>> > > >>>>>> I'm fine building from source and running from there, as long as I > > >>>>>> know what to do. Would that make it any easier? It definitely would > > >>>>>> for me :-) > > >>>>> > > >>>>> The podman approach seemed to work, > > Thanks for bearing with it. > > > >>>>> and I was able to run all three > > >>>>> steps. Didn't see any hangs. I'm going to try again dropping down > > >>>>> the innodb pool size (box only has 32G of RAM). > > >>>>> > > >>>>> The storage can do a lot more than 5k IOPS, I'm going to try ramping > > >>>>> that up. > > Good. > > > >>>>> > > >>>>> Does your reproducer box have multiple NUMA nodes, or is it a single > > >>>>> socket/nod box? > > It was NUMA. Pre 5.14.14 I could produce it on a simpler test on a single node. > > > >>>> > > >>>> Doesn't seem to reproduce for me on current -git. What file system are > > >>>> you using? > > Yes ext4. > > > >>> > > >>> I seem to be able to hit it with ext4, guessing it has more cases that > > >>> punt to buffered IO. As I initially suspected, I think this is a race > > >>> with buffered file write hashing. I have a debug patch that just turns > > >>> a regular non-numa box into multi nodes, may or may not be needed be > > >>> needed to hit this, but I definitely can now. Looks like this: > > >>> > > >>> Node7 DUMP > > >>> index=0, nr_w=1, max=128, r=0, f=1, h=0 > > >>> w=ffff8f5e8b8470c0, hashed=1/0, flags=2 > > >>> w=ffff8f5e95a9b8c0, hashed=1/0, flags=2 > > >>> index=1, nr_w=0, max=127877, r=0, f=0, h=0 > > >>> free_list > > >>> worker=ffff8f5eaf2e0540 > > >>> all_list > > >>> worker=ffff8f5eaf2e0540 > > >>> > > >>> where we seed node7 in this case having two work items pending, but the > > >>> worker state is stalled on hash. > > >>> > > >>> The hash logic was rewritten as part of the io-wq worker threads being > > >>> changed for 5.11 iirc, which is why that was my initial suspicion here. > > >>> > > >>> I'll take a look at this and make a test patch. Looks like you are able > > >>> to test self-built kernels, is that correct? > > I've been libreating prebuilt kernels, however on the path to self-built again. > > Just searching for the holy penguin pee (from yaboot da(ze|ys)) to > peesign(sic) EFI kernels. > jk, working through docs: > https://docs.fedoraproject.org/en-US/quick-docs/kernel/build-custom-kernel/ > > > >> Can you try with this patch? It's against -git, but it will apply to > > >> 5.15 as well. > > > > > > I think that one covered one potential gap, but I just managed to > > > reproduce a stall even with it. So hang on testing that one, I'll send > > > you something more complete when I have confidence in it. > > > > Alright, give this one a go if you can. Against -git, but will apply to > > 5.15 as well. > > Applied, built, attempting to boot.... If you want to do the same for Debian based system, the following might help to get the package built: https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#s4.2.2 I might be able to provide you otherwise a prebuild package with the patch (unsigned though, but best if you built and test it directly) Regards, Salvatore