On Wed, Jun 26, 2019 at 8:43 PM Theodore Ts'o <tytso@xxxxxxx> wrote: > > On Wed, Jun 26, 2019 at 10:27:08AM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit: abf02e29 Merge tag 'pm-5.2-rc6' of git://git.kernel.org/pu.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=1435aaf6a00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=e5c77f8090a3b96b > > dashboard link: https://syzkaller.appspot.com/bug?extid=4bfbbf28a2e50ab07368 > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11234c41a00000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15d7f026a00000 > > > > The bug was bisected to: > > > > commit 0c81ea5db25986fb2a704105db454a790c59709c > > Author: Elad Raz <eladr@xxxxxxxxxxxx> > > Date: Fri Oct 28 19:35:58 2016 +0000 > > > > mlxsw: core: Add port type (Eth/IB) set API > > Um, so this doesn't pass the laugh test. > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10393a89a00000 > > It looks like the automated bisection machinery got confused by two > failures getting triggered by the same repro; the symptoms changed > over time. Initially, the failure was: > > crashed: INFO: rcu detected stall in {sys_sendfile64,ext4_file_write_iter} > > Later, the failure changed to something completely different, and much > earlier (before the test was even started): > > run #5: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" "-P" "22" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "BatchMode=yes" "-o" "IdentitiesOnly=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "-i" "/syzkaller/jobs/linux/workdir/image/key" "/tmp/syz-executor216456474" "root@10.128.15.205:./syz-executor216456474"]: exit status 1 > Connection timed out during banner exchange > lost connection > > Looks like an opportunity to improve the bisection engine? Hi Ted, Yes, these infrastructure errors plague bisections episodically. That's https://github.com/google/syzkaller/issues/1250 It did not confuse bisection explicitly as it understands that these are infrastructure failures rather then a kernel crash, e.g. here you may that it correctly identified that this run was OK and started bisection in v4.10 v4.9 range besides 2 scp failures: testing release v4.9 testing commit 69973b830859bc6529a7a0468ba0d80ee5117826 with gcc (GCC) 5.5.0 run #0: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" ...]: exit status 1 Connection timed out during banner exchange run #1: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" ....]: exit status 1 Connection timed out during banner exchange run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK # git bisect start v4.10 v4.9 Though, of course, it may confuse bisection indirectly by reducing number of tests per commit. So far I wasn't able to gather any significant info about these failures. We gather console logs, but on these runs they are empty. It's easy to blame everything onto GCE but I don't have any bit of information that would point either way. These failures just appear randomly in production and usually in batches...