On Mon, Jun 29, 2020 at 07:27:40PM +0200, Dmitry Vyukov wrote: > On Mon, Jun 29, 2020 at 4:42 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote: > > > > On Sun, Jun 28, 2020 at 12:25 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > > > On Sat, Jun 27, 2020 at 09:02:05PM +0800, Hillf Danton wrote: > > > > > So, to hit this syzkaller one of these must have happened: > > > > > 1) rdma_addr_cancel() didn't work and the process_one_work() is still > > > > > runnable/running > > > > > > > > What syzbot reported indicates that the kworker did survive not only > > > > canceling work but the handler_mutex, despite it's a sync cancel that > > > > waits for the work to complete. > > > > > > The syzbot report doesn't confirm that the cancel work was actaully > > > called. > > > > > > The most likely situation is that it was skipped because of the state > > > mangling the patch fixes.. > > > > > > > > 2) The state changed away from RDMA_CM_ADDR_QUERY without doing > > > > > rdma_addr_cancel() > > > > > > > > The cancel does cover the query state in the reported case, and have > > > > difficult time working out what's in the patch below preventing the > > > > work from going across the line the sync cancel draws. That's the > > > > question we can revisit once there is a reproducer available. > > > > > > rdma-cm never seems to get reproducers from syzkaller > > > > +syzkaller mailing list > > > > Hi Jason, > > > > Wonder if there is some systematic issue. Let me double check. > > By scanning bugs at: > https://syzkaller.appspot.com/upstream > https://syzkaller.appspot.com/upstream/fixed > > I found a significant number of bugs that I would qualify as "rdma-cm" > and that have reproducers. Here is an incomplete list (I did not get > to the end): > > https://syzkaller.appspot.com/bug?id=b8febdb3c7c8c1f1b606fb903cee66b21b2fd02f > https://syzkaller.appspot.com/bug?id=d5222b3e1659e0aea19df562c79f216515740daa > https://syzkaller.appspot.com/bug?id=c600e111223ce0a20e5f2fb4e9a4ebdff54d7fa6 > https://syzkaller.appspot.com/bug?id=a9796acbdecc1b2ba927578917755899c63c48af > https://syzkaller.appspot.com/bug?id=95f89b8fb9fdc42e28ad586e657fea074e4e719b > https://syzkaller.appspot.com/bug?id=8dc0bcd9dd6ec915ba10b3354740eb420884acaa > https://syzkaller.appspot.com/bug?id=805ad726feb6910e35088ae7bbe61f4125e573b7 > https://syzkaller.appspot.com/bug?id=56b60fb3340c5995373fe5b8eae9e8722a012fc4 > https://syzkaller.appspot.com/bug?id=38d36d1b26b4299bf964d50af4d79688d39ab960 > https://syzkaller.appspot.com/bug?id=25e00dd59f31783f233185cb60064b0ab645310f > https://syzkaller.appspot.com/bug?id=2f38d7e5312fdd0acc979c5e26ef2ef8f3370996 > > Do you mean some specific subset of bugs by "rdma-cm"? If yes, what is > that subset? The race condition bugs never seem to get reproducers, I checked a few of the above and these are much more deterministic things. I think the recurrance rate for the races is probably too low? Jason