On Thu, Jun 6, 2019 at 5:25 PM Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote: > > On 06.06.2019 18:18, Dmitry Vyukov wrote: > > On Thu, Jun 6, 2019 at 4:54 PM Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote: > >> > >> On 06.06.2019 17:40, Dmitry Vyukov wrote: > >>> On Thu, Jun 6, 2019 at 3:43 PM Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote: > >>>> > >>>> On 06.06.2019 16:13, J. Bruce Fields wrote: > >>>>> On Thu, Jun 06, 2019 at 10:47:43AM +0300, Kirill Tkhai wrote: > >>>>>> This may be connected with that shrinker unregistering is forgotten on error path. > >>>>> > >>>>> I was wondering about that too. Seems like it would be hard to hit > >>>>> reproduceably though: one of the later allocations would have to fail, > >>>>> then later you'd have to create another namespace and this time have a > >>>>> later module's init fail. > >>>> > >>>> Yes, it's had to bump into this in real life. > >>>> > >>>> AFAIU, syzbot triggers such the problem by using fault-injections > >>>> on allocation places should_failslab()->should_fail(). It's possible > >>>> to configure a specific slab, so the allocations will fail with > >>>> requested probability. > >>> > >>> No fault injection was involved in triggering of this bug. > >>> Fault injection is clearly visible in console log as "INJECTING > >>> FAILURE at this stack track" splats and also for bugs with repros it > >>> would be noted in the syzkaller repro as "fault_call": N. So somehow > >>> this bug was triggered as is. > >>> > >>> But overall syzkaller can do better then the old probabilistic > >>> injection. The probabilistic injection tend to both under-test what we > >>> want to test and also crash some system services. syzkaller uses the > >>> new "systematic fault injection" that allows to test specifically each > >>> failure site separately in each syscall separately. > >> > >> Oho! Interesting. > > > > If you are interested. You write N into /proc/thread-self/fail-nth > > (say, 5) then it will cause failure of the N-th (5-th) failure site in > > the next syscall in this task only. And by reading it back after the > > syscall you can figure out if the failure was indeed injected or not > > (or the syscall had less than 5 failure sites). > > Then, for each syscall in a test (or only for one syscall of > > interest), we start by writing "1" into /proc/thread-self/fail-nth; if > > the failure was injected, write "2" and restart the test; if the > > failure was injected, write "3" and restart the test; and so on, until > > the failure wasn't injected (tested all failure sites). > > This guarantees systematic testing of each error path with minimal > > number of runs. This has obvious extensions to "each pair of failure > > sites" (to test failures on error paths), but it's not supported atm. > > And what you do in case of a tested syscall has pre-requisites? Say, > you test close(), which requires open() and some IO before. Are such > the dependencies statically declared in some configuration file? Or > you test any repeatable sequence of syscalls? There are several things at play here. 1. syzkaller has notion of "resources". A resource is something that's produced by one system call and consumed by another, like a file descriptor. E.g. see this for userfault fd: https://github.com/google/syzkaller/blob/698773cb4fbe8873ee0a2c37b86caef01e2c6159/sys/linux/uffd.txt#L8-L12 This allows syzkaller to understand that there is something called fd_uffd that is produced by userfaultfd() and then needs to be passed to ioctl$UFFDIO_API(). So for close it knows that it needs to get the fd somewhere first. 2. For syscalls are not explicitly tied by any resources, it will just try to combine them randomly. 3. There is coverage-guided reinforcement learning. When it discovers some sensible combinations of syscalls (as denoted by new kernel code coverage) it memorizes that program for future mutations to get even more interesting and more sensible programs. This is allows syzkaller to build more and more interesting programs by doing small incremental steps (this is the general idea of coverage-guided fuzzing).