Re: [PATCH v2] fault-inject: support systematic fault injection

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Wed, 24 May 2017 12:22:26 +0200

Hi,

I've made the current interface work with all types of our sandboxes.
For setuid the secret souse was prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) to
make /proc entries non-root owned. So I am fine with the current
version of the code.

Andrew,

I see that this is already in linux-next. Please proceed with pushing
it further.

Thanks

On Sun, Apr 9, 2017 at 12:39 PM, Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
> 2017-04-09 2:40 GMT+09:00 Dmitry Vyukov <dvyukov@xxxxxxxxxx>:
>> On Fri, Apr 7, 2017 at 6:47 PM, Akinobu Mita <akinobu.mita@xxxxxxxxx> wrote:
>>> 2017-04-07 3:33 GMT+09:00 Michal Hocko <mhocko@xxxxxxxxxx>:
>>>> [Let's add linux-api - please always cc this list when adding/modifying
>>>> user visible interfaces]
>>>>
>>>> On Tue 28-03-17 15:01:28, Dmitry Vyukov wrote:
>>>>> Add /proc/self/task/<current-tid>/fail-nth file that allows failing
>>>>> 0-th, 1-st, 2-nd and so on calls systematically.
>>>>> Excerpt from the added documentation:
>>>>
>>>> I didn't really get to read through details here but it just feels wrong
>>>> to add this debugging only feature into proc. It also smells like one
>>>> off thing as well.
>>>
>>> We have 'sched' (CONFIG_SCHED_DEBUG), 'latency' (CONFIG_LATENCYTOP)
>>> and 'make-it-fail' as debugging per-process proc files.  So it doesn't
>>> look very wrong to me.  But I would like to avoid per-process proc
>>> directory becoming messy. Do you think introducing /proc/<pid>/debug/
>>> directory for debugging stuff makes sense?
>>>
>>> Side note: 'fail-nth' was originally a single debugfs file
>>> /sys/kernel/debug/fail_once.  But it actually read/write current task's
>>> fail_nth field, so I suggested changing per process procfs file.i
>>> This change enables to inject N-th fail to kernel threads, too.
>>
>>
>> /sys/kernel/debug/fail_once (or fail_nth) looks more appropriate to me
>> for a optional testing feature. The fact that it currently
>> reads/writes a task_struct field is merely an implementation detail.
>> I would also prefer ioctl's. Then we don't need to preserve "symmetry"
>> for no useful reason and deal with nonsensical uses like setting it
>> for a non-current task and running cat on it.
>
> Sounds reasonable for adding ioctl interface as it can work with your
> setuid sandbox test.  But could you keep the procfs interface, too?
> Because both ioctl debugfs interface and procfs interface can co-exist
> and I would like to play it with procfs for a while.
>
>>>>> ===
>>>>> Write to this file of integer N makes N-th call in the current task fail
>>>>> (N is 0-based). Read from this file returns a single char 'Y' or 'N'
>>>>> that says if the fault setup with a previous write to this file was
>>>>> injected or not, and disables the fault if it wasn't yet injected.
>>>>> Note that this file enables all types of faults (slab, futex, etc).
>>>>> This setting takes precedence over all other generic settings like
>>>>> probability, interval, times, etc. But per-capability settings
>>>>> (e.g. fail_futex/ignore-private) take precedence over it.
>>>>> This feature is intended for systematic testing of faults in a single
>>>>> system call. See an example below.
>>>>> ===
>>>>>
>>>>> Why adding new setting:
>>>>> 1. Existing settings are global rather than per-task.
>>>>>    So parallel testing is not possible.
>>>>> 2. attr->interval is close but it depends on attr->count
>>>>>    which is non reset to 0, so interval does not work as expected.
>>>>> 3. Trying to model this with existing settings requires manipulations
>>>>>    of all of probability, interval, times, space, task-filter and
>>>>>    unexposed count and per-task make-it-fail files.
>>>>> 4. Existing settings are per-failure-type, and the set of failure
>>>>>    types is potentially expanding.
>>>>> 5. make-it-fail can't be changed by unprivileged user and aggressive
>>>>>    stress testing better be done from an unprivileged user.
>>>>>    Similarly, this would require opening the debugfs files to the
>>>>>    unprivileged user, as he would need to reopen at least times file
>>>>>    (not possible to pre-open before dropping privs).
>>>>>
>>>>> The proposed interface solves all of the above (see the example).
>>>>>
>>>>> Signed-off-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
>>>>> Cc: Akinobu Mita <akinobu.mita@xxxxxxxxx>
>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>>>>> Cc: linux-mm@xxxxxxxxx
>>>>>
>>>>> ---
>>>>> We want to integrate this into syzkaller fuzzer.
>>>>> A prototype has found 10 bugs in kernel in first day of usage:
>>>>> https://groups.google.com/forum/#!searchin/syzkaller/%22FAULT_INJECTION%22%7Csort:relevance
>>>>>
>>>>> Changes since v1:
>>>>>  - change file name from /sys/kernel/debug/fail_once
>>>>>    to /proc/self/task/<current-tid>/fail-nth as per
>>>>>    Akinobu suggestion
>>>>>
>>>>> ---
>>>>>  Documentation/fault-injection/fault-injection.txt | 78 +++++++++++++++++++++++
>>>>>  fs/proc/base.c                                    | 52 +++++++++++++++
>>>>>  include/linux/sched.h                             |  1 +
>>>>>  kernel/fork.c                                     |  4 ++
>>>>>  lib/fault-inject.c                                |  7 ++
>>>>>  5 files changed, 142 insertions(+)
>>>>>
>>>>> diff --git a/Documentation/fault-injection/fault-injection.txt b/Documentation/fault-injection/fault-injection.txt
>>>>> index 415484f3d59a..192d8cbcc5f9 100644
>>>>> --- a/Documentation/fault-injection/fault-injection.txt
>>>>> +++ b/Documentation/fault-injection/fault-injection.txt
>>>>> @@ -134,6 +134,22 @@ use the boot option:
>>>>>       fail_futex=
>>>>>       mmc_core.fail_request=<interval>,<probability>,<space>,<times>
>>>>>
>>>>> +o proc entries
>>>>> +
>>>>> +- /proc/self/task/<current-tid>/fail-nth:
>>>>> +
>>>>> +     Write to this file of integer N makes N-th call in the current task fail
>>>>> +     (N is 0-based). Read from this file returns a single char 'Y' or 'N'
>>>>> +     that says if the fault setup with a previous write to this file was
>>>>> +     injected or not, and disables the fault if it wasn't yet injected.
>>>>> +     Note that this file enables all types of faults (slab, futex, etc).
>>>>> +     This setting takes precedence over all other generic debugfs settings
>>>>> +     like probability, interval, times, etc. But per-capability settings
>>>>> +     (e.g. fail_futex/ignore-private) take precedence over it.
>>>>> +
>>>>> +     This feature is intended for systematic testing of faults in a single
>>>>> +     system call. See an example below.
>>>>> +
>>>>>  How to add new fault injection capability
>>>>>  -----------------------------------------
>>>>>
>>>>> @@ -278,3 +294,65 @@ allocation failure.
>>>>>       # env FAILCMD_TYPE=fail_page_alloc \
>>>>>               ./tools/testing/fault-injection/failcmd.sh --times=100 \
>>>>>                  -- make -C tools/testing/selftests/ run_tests
>>>>> +
>>>>> +Systematic faults using fail-nth
>>>>> +---------------------------------
>>>>> +
>>>>> +The following code systematically faults 0-th, 1-st, 2-nd and so on
>>>>> +capabilities in the socketpair() system call.
>>>>> +
>>>>> +#include <sys/types.h>
>>>>> +#include <sys/stat.h>
>>>>> +#include <sys/socket.h>
>>>>> +#include <sys/syscall.h>
>>>>> +#include <fcntl.h>
>>>>> +#include <unistd.h>
>>>>> +#include <string.h>
>>>>> +#include <stdlib.h>
>>>>> +#include <stdio.h>
>>>>> +#include <errno.h>
>>>>> +
>>>>> +int main()
>>>>> +{
>>>>> +     int i, err, res, fail_nth, fds[2];
>>>>> +     char buf[128];
>>>>> +
>>>>> +     system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
>>>>> +     sprintf(buf, "/proc/self/task/%ld/fail-nth", syscall(SYS_gettid));
>>>>> +     fail_nth = open(buf, O_RDWR);
>>>>> +     for (i = 0;; i++) {
>>>>> +             sprintf(buf, "%d", i);
>>>>> +             write(fail_nth, buf, strlen(buf));
>>>>> +             res = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
>>>>> +             err = errno;
>>>>> +             read(fail_nth, buf, 1);
>>>>> +             if (res == 0) {
>>>>> +                     close(fds[0]);
>>>>> +                     close(fds[1]);
>>>>> +             }
>>>>> +             printf("%d-th fault %c: res=%d/%d\n", i, buf[0], res, err);
>>>>> +             if (buf[0] != 'Y')
>>>>> +                     break;
>>>>> +     }
>>>>> +     return 0;
>>>>> +}
>>>>> +
>>>>> +An example output:
>>>>> +
>>>>> +0-th fault Y: res=-1/23
>>>>> +1-th fault Y: res=-1/23
>>>>> +2-th fault Y: res=-1/23
>>>>> +3-th fault Y: res=-1/12
>>>>> +4-th fault Y: res=-1/12
>>>>> +5-th fault Y: res=-1/23
>>>>> +6-th fault Y: res=-1/23
>>>>> +7-th fault Y: res=-1/23
>>>>> +8-th fault Y: res=-1/12
>>>>> +9-th fault Y: res=-1/12
>>>>> +10-th fault Y: res=-1/12
>>>>> +11-th fault Y: res=-1/12
>>>>> +12-th fault Y: res=-1/12
>>>>> +13-th fault Y: res=-1/12
>>>>> +14-th fault Y: res=-1/12
>>>>> +15-th fault Y: res=-1/12
>>>>> +16-th fault N: res=0/12
>>>>> diff --git a/fs/proc/base.c b/fs/proc/base.c
>>>>> index 6e8655845830..66001172249b 100644
>>>>> --- a/fs/proc/base.c
>>>>> +++ b/fs/proc/base.c
>>>>> @@ -1353,6 +1353,53 @@ static const struct file_operations proc_fault_inject_operations = {
>>>>>       .write          = proc_fault_inject_write,
>>>>>       .llseek         = generic_file_llseek,
>>>>>  };
>>>>> +
>>>>> +static ssize_t proc_fail_nth_write(struct file *file, const char __user *buf,
>>>>> +                                size_t count, loff_t *ppos)
>>>>> +{
>>>>> +     struct task_struct *task;
>>>>> +     int err, n;
>>>>> +
>>>>> +     task = get_proc_task(file_inode(file));
>>>>> +     if (!task)
>>>>> +             return -ESRCH;
>>>>> +     put_task_struct(task);
>>>>> +     if (task != current)
>>>>> +             return -EPERM;
>>>>> +     err = kstrtoint_from_user(buf, count, 10, &n);
>>>>> +     if (err)
>>>>> +             return err;
>>>>> +     if (n < 0 || n == INT_MAX)
>>>>> +             return -EINVAL;
>>>>> +     current->fail_nth = n + 1;
>>>>> +     return len;
>>>>> +}
>>>>> +
>>>>> +static ssize_t proc_fail_nth_read(struct file *file, char __user *buf,
>>>>> +                               size_t count, loff_t *ppos)
>>>>> +{
>>>>> +     struct task_struct *task;
>>>>> +     int err;
>>>>> +
>>>>> +     task = get_proc_task(file_inode(file));
>>>>> +     if (!task)
>>>>> +             return -ESRCH;
>>>>> +     put_task_struct(task);
>>>>> +     if (task != current)
>>>>> +             return -EPERM;
>>>>> +     if (count < 1)
>>>>> +             return -EINVAL;
>>>>> +     err = put_user((char)(current->fail_nth ? 'N' : 'Y'), buf);
>>>>> +     if (err)
>>>>> +             return err;
>>>>> +     current->fail_nth = 0;
>>>>> +     return 1;
>>>>> +}
>>>>> +
>>>>> +static const struct file_operations proc_fail_nth_operations = {
>>>>> +     .read           = proc_fail_nth_read,
>>>>> +     .write          = proc_fail_nth_write,
>>>>> +};
>>>>>  #endif
>>>>>
>>>>>
>>>>> @@ -3296,6 +3343,11 @@ static const struct pid_entry tid_base_stuff[] = {
>>>>>  #endif
>>>>>  #ifdef CONFIG_FAULT_INJECTION
>>>>>       REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
>>>>> +     /*
>>>>> +      * Operations on the file check that the task is current,
>>>>> +      * so we create it with 0666 to support testing under unprivileged user.
>>>>> +      */
>>>>> +     REG("fail-nth", 0666, proc_fail_nth_operations),
>>>>>  #endif
>>>>>  #ifdef CONFIG_TASK_IO_ACCOUNTING
>>>>>       ONE("io",       S_IRUSR, proc_tid_io_accounting),
>>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>>> index 543e0ea82684..7b50221fea51 100644
>>>>> --- a/include/linux/sched.h
>>>>> +++ b/include/linux/sched.h
>>>>> @@ -1897,6 +1897,7 @@ struct task_struct {
>>>>>  #endif
>>>>>  #ifdef CONFIG_FAULT_INJECTION
>>>>>       int make_it_fail;
>>>>> +     int fail_nth;
>>>>>  #endif
>>>>>       /*
>>>>>        * when (nr_dirtied >= nr_dirtied_pause), it's time to call
>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>> index 61284d8122fa..869c97a0a930 100644
>>>>> --- a/kernel/fork.c
>>>>> +++ b/kernel/fork.c
>>>>> @@ -545,6 +545,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>>>>>
>>>>>       kcov_task_init(tsk);
>>>>>
>>>>> +#ifdef CONFIG_FAULT_INJECTION
>>>>> +     tsk->fail_nth = 0;
>>>>> +#endif
>>>>> +
>>>>>       return tsk;
>>>>>
>>>>>  free_stack:
>>>>> diff --git a/lib/fault-inject.c b/lib/fault-inject.c
>>>>> index 6a823a53e357..d6516ba64d33 100644
>>>>> --- a/lib/fault-inject.c
>>>>> +++ b/lib/fault-inject.c
>>>>> @@ -107,6 +107,12 @@ static inline bool fail_stacktrace(struct fault_attr *attr)
>>>>>
>>>>>  bool should_fail(struct fault_attr *attr, ssize_t size)
>>>>>  {
>>>>> +     if (in_task() && current->fail_nth) {
>>>>> +             if (--current->fail_nth == 0)
>>>>> +                     goto fail;
>>>>> +             return false;
>>>>> +     }
>>>>> +
>>>>>       /* No need to check any other properties if the probability is 0 */
>>>>>       if (attr->probability == 0)
>>>>>               return false;
>>>>> @@ -134,6 +140,7 @@ bool should_fail(struct fault_attr *attr, ssize_t size)
>>>>>       if (!fail_stacktrace(attr))
>>>>>               return false;
>>>>>
>>>>> +fail:
>>>>>       fail_dump(attr);
>>>>>
>>>>>       if (atomic_read(&attr->times) != -1)
>>>>> --
>>>>> 2.12.2.564.g063fe858b8-goog
>>>>>
>>>>> --
>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
>>>>> see: http://www.linux-mm.org/ .
>>>>> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
>>>>
>>>> --
>>>> Michal Hocko
>>>> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>