On Thu, Sep 16, 2021 at 04:45:27PM +0200, Dmitry Vyukov wrote: > It looks like a very hard to trigger race (few crashes, no reproducer, > but KASAN reports look sensible). That's probably the reason syzkaller > can't create a reproducer. > From the log it looks like it was triggered by one of these programs > below. But I tried to reproduce manually and had no success. > We are currently doing some improvements to race triggering code in > syzkaller, and may try to use this as a litmus test to see if > syzkaller will do any better: > https://github.com/google/syzkaller/issues/612#issuecomment-920961538 I would suggest to look at this: https://patchwork.kernel.org/project/linux-rdma/patch/0-v1-9fbb33f5e201+2a-cma_listen_jgg@xxxxxxxxxx/ Which I think should be completely deterministic, just do the RDMA_CM ops in the right order, but syzbot didn't find a reproducer. The "healer" fork did however: https://lore.kernel.org/all/CACkBjsY-CNzO74XGo0uJrcaZTubC+Yw9Sg1bNNi+evUOGaZTCg@xxxxxxxxxxxxxx/#r > Answering your question re what was running concurrently with what. > Each of the syscalls in these programs can run up to 2 times and > ultimately any of these calls can race with any. Potentially syzkaller > can predict values kernel will return (e.g. id's) before kernel > actually returned them. I guess this does not restrict search area for > the bug a lot... Well, it does help if it is only those system calls And I think I can discount the workqueue as a problem as I'd expect a kasn hit on the 'req' allocation if the workqueue was malfunctioning - thus I must conclude we are not calling work cancelation for some reason. Jason