On Sun, May 13, 2018 at 4:29 PM, Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > Eric Biggers wrote: >> Generally it's best to close syzbot bug reports once the original cause is >> fixed, so that syzbot can continue to report other bugs with the same signature. > > That's difficult to judge. Closing as soon as the original cause is fixed allows > syzbot to try to report different reproducer for different bugs. But at the same time, > different/similar bugs which were reported in that report (or comments in the discussion > for that report) will become almost invisible from users (because users unlikely check > other reports in already fixed bugs). > > An example is > > general protection fault in kernfs_kill_sb (2) > https://syzkaller.appspot.com/bug?id=903af3e08fc7ec60e57d9c9b93b035f4fb038d9a > > where the cause of above report was already pointed out in the discussion for > the below report. > > general protection fault in kernfs_kill_sb > https://syzkaller.appspot.com/bug?id=d7db6ecf34f099248e4ff404cd381a19a4075653 > > Since the latter is marked as "fixed on May 08 18:30", I worry that quite few > users would check the relationship. > >> Note also that a "workqueue lockup" can be caused by almost anything in the >> kernel, I think. This one for example is probably in the sound subsystem: >> https://syzkaller.appspot.com/text?tag=CrashReport&x=1767232b800000 >> > > Right. Maybe we should not stop the test upon "workqueue lockup" message, for > it is likely that the cause of lockup is that somebody is busy looping which > should have been reported shortly as "rcu detected stall". > > Of course, there is possibility that "workqueue lockup" is reported because > cond_resched() was used when explicit schedule_timeout_*() is required, which > was the reason commit 82607adcf9cdf40f ("workqueue: implement lockup detector") > was added. > > If we stop the test upon "workqueue lockup" message, maybe longer timeout (e.g. > 300 seconds) is better so that rcu stall or hung task messages are reported > if rcu stall or hung task is occurring. Yes, we need order different stalls/lockups/hangs/etc according to what can trigger what. E.g. rcu stall can trigger task hung and workqueue lockup, but not the other way around. There is https://github.com/google/syzkaller/issues/516 to track this. But I did not yet have time to figure out all required changes. If you have additional details, please add them there.