On Fri, Apr 06, 2018 at 08:32:26AM +1000, Dave Chinner wrote: > On Wed, Apr 04, 2018 at 08:24:54PM -0700, Matthew Wilcox wrote: > > On Wed, Apr 04, 2018 at 11:22:00PM -0400, Theodore Y. Ts'o wrote: > > > On Wed, Apr 04, 2018 at 12:35:04PM -0700, Matthew Wilcox wrote: > > > > On Wed, Apr 04, 2018 at 09:24:05PM +0200, Dmitry Vyukov wrote: > > > > > On Tue, Apr 3, 2018 at 4:01 AM, syzbot > > > > > <syzbot+dc5ab2babdf22ca091af@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) > > > > > > WARNING: CPU: 1 PID: 4441 at kernel/locking/rwsem.c:133 up_write+0x1cc/0x210 > > > > > > kernel/locking/rwsem.c:133 > > > > > > Kernel panic - not syncing: panic_on_warn set ... > > > > > > > > Message-Id: <1522852646-2196-1-git-send-email-longman@xxxxxxxxxx> > > > > > > > > > > We were way ahead of syzbot in this case. :-) > > > > Not really ... syzbot caught it Monday evening ;-) > > Rather than arguing over who reported it first, I think that time > would be better spent reflecting on why the syzbot report was > completely ignored until *after* Ted diagnosed the issue > independently and Waiman had already fixed it.... > > Clearly there is scope for improvement here. > > Cheers, > Well, ultimately a human needed to investigate the syzbot bug report to figure out what was really going on. In my view, the largest problem is that there are simply too many bugs, so many are getting ignored. If there were only a few bugs, then Dmitry would investigate each one and send a "real" bug report of better quality than the automated system can provide, or even send a fix directly. But in reality, on the same day this bug was reported, syzbot also found 10 other bugs, and in the previous 2 days it had found 38 more. No single person can keep up with that. You can see the current bug list, which has 172 open bugs, on the dashboard at https://syzkaller.appspot.com/. Yes, the kernel really is that broken. Though, of course most bugs are in specific modules, not the core kernel. And although quite a few of these bugs will end up to be duplicates or even already fixed, a human still has to look at each one to figure that out. (Though, I do think that syzbot should try to automatically detect when a reproducible bug was already fixed, via bisection. It would cause a few bugs to be incorrectly considered fixed, but it may be a worthwhile tradeoff.) These bugs are all over the kernel as well, so most developers don't see the big picture but rather just see a few bugs for "their" subsystem on "their" subsystem's mailing list and sometimes demand special attention. Of course, it's great when people suggest ways to improve the process. But it's not great when people just don't feel responsible for fixing bugs and wait for Someone Else to do it. I'm hoping that in the future the syzbot "team", which seems to actually be just Dmitry now, can get more resources towards helping fix the bugs. But either way, in the end Linux is a community effort. Note also that syzbot wasn't super useful in this particular case because people running xfstests came across the same bug. But, this is actually a rare case. Most syzbot bug reports have been for weird corner cases or races that no one ever thought of before, so there are no existing tests that find them. Thanks, Eric