Re: [syzbot] WARNING in do_mkdirat

Marco Elver <elver@xxxxxxxxxx> · Mon, 12 Dec 2022 20:29:10 +0100

On Mon, 12 Dec 2022 at 19:58, Theodore Ts'o <tytso@xxxxxxx> wrote:
>
> On Mon, Dec 12, 2022 at 11:29:11AM +0800, Hillf Danton wrote:
> > > You've completely misunderstood Al's point.  He's not whining about
> > > being cc'd, he's pointing at this is ONLY USEFUL IF THE NTFS3
> > > MAINTAINERS ARE CC'd.  And they're not.  So this is just noise.
> > > And enough noise means that signal is lost.
> >
> > Call Trace:
> >  <TASK>
> >  inode_unlock include/linux/fs.h:761 [inline]
> >  done_path_create fs/namei.c:3857 [inline]
> >  do_mkdirat+0x2de/0x550 fs/namei.c:4064
> >  __do_sys_mkdirat fs/namei.c:4076 [inline]
> >  __se_sys_mkdirat fs/namei.c:4074 [inline]
> >  __x64_sys_mkdirat+0x85/0x90 fs/namei.c:4074
> >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >  do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
> >  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > Given the call trace above, how do you know the ntfs3 guys should be also
> > Cced in addition to AV? What if it would take more than three months for
> > syzbot to learn the skills in your mind? What is preventing you routing
> > the report to ntfs3?
>
> If it takes 3 months for syzbot to take a look at the source code in
> their own #!@?! reproducer, or just to take a look at the strace link
> in the dashboard:
>
> [pid  3639] mount("/dev/loop0", "./file2", "ntfs3", MS_NOSUID|MS_NOEXEC|MS_DIRSYNC|MS_I_VERSION, "") = 0
>
> There's something really wrong.  The point Al has been making (and
> I've been making for multiple years) is that Syzbot has the
> information, but unfortunately, at the moment, it is only analyzing
> the the stack trace, and it is not doing things that really could be
> done automatically --- and cloud VM time is cheap, and upstream
> maintainer time is expensive.  So by not improving syzbot in a way
> that really shouldn't be all that difficult, the syzbot maintainers is
> disrespectiving the time of the upstream maintainers.
>
> So sure, we could ask Linus to triage all syzbot reports --- or we
> could ask Al to triage all syzbot file system reports --- but that is
> not a good use of upstream resources.
>
> And "we didn't know this is super annoying" isn't an excuse, because
> I've been asking for things like this *before* the COVID pandemic.  So
> if the Syzbot team won't listen to observations by a random Google
> engineer who happens to be an ext4 maintainer (or rather, I'm sure
> they were listening, but they didn't consider it important enough to
> staff and put on the roadmap), maybe something a bit
> more.... assertive by Al is something that will inspire them to
> prioritize this feature request "above the fold".  :-)
>
> And Al does have a point --- if a lot of upstream maintainers consider
> Syzbot reports to be less than useful, they will either auto-file
> reports to a junk folder, or just ignore the Syzbot reports because
> they are busy and the Probability(Usefulness) is close to zero, then
> recovering from that black eye to Syzbot's reputation is going to be a
> lot more difficult than if Syzbot was made more respectful of upstream
> maintainer time much earlier.
>
> Now, to be fair to the Syzbot team, the Syzbot console has gotten much
> better.  You can now download the syzbot trace, and download the
> mounted file system, when before, you had to do a lot more work to
> extract the file system (which is stored in separate constant C
> array's as compressed data) from the C reproducer.  So have things
> have gotten better.
>
> But at the same time, characterizing a syzbot report is something to
> be done by every file system maintainer who looks as a syzbot report,
> because there is no way to add a tag to the syzbot report that this
> particular syzbot report *really* is an ntfs3 issue.  So any
> information that a single developer figures out when triaging a bug
> (is this potentially an ext4 bug, nope, it's an ntfs3 bug) has to be
> replicated by every single kernel developer looking at the Syzbot
> dashboard.  Which again, is not respectful of upstream maintainers'
> time.

This is being worked on:
https://github.com/google/syzkaller/issues/3393#issuecomment-1330305227

Teaching a bot the pattern matching skills of a human is non-trivial.
The current design will likely do the simplest thing: regex match
reproducers and map a match to some kernel source dir, for which the
maintainers are Cc'd. If you have better suggestions on how to
mechanize subsystem selection based on a reproducer, please shout.