Re: [PATCH -V3 0/8] migrate_pages(): fix several bugs in error path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 19, 2022 at 7:44 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote:
>
> Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> writes:
>
> > On Wed, 17 Aug 2022 16:14:00 +0800 Huang Ying <ying.huang@xxxxxxxxx> wrote:
> >
> >> error-inject.patch, test-migrate.c, and test-migrate.sh are as below.
> >> It turns out that error injection is an important tool to fix bugs in
> >> error path.
> >
> > Indeed, and thanks for doing this.
> >
> > Did you consider lib/*inject*.c?  If so, was it unsuitable?
>

Thanks, Ying. Great tips!

> I have done some experiments to use some existing error injection
> mechanisms in kernel to test the error path of migrate_pages().  After
> some googling, I found that the BPF based error injection described in
> the following URL is most suitable for my purpose.
>
> https://lwn.net/Articles/740146/
>
> Because the BPF seems quite flexible to satisfy various requirements of
> error injection.  With it, the execution of some functions can be
> skipped and some specified error code can be returned instead.
>
> Works out of box
> ================
>
> Some error injection functionality just works out of box.  For example,
> inject some page allocation error in some path.  Firstly,
> CONFIG_BPF_KPROBE_OVERRIDE needs to be enabled in kernel config.  Then,
> a simple bpftrace script as follows can be used to inject page
> allocation error during migrate_pages().
>
> --------------------ENOMEM-----------------------
> kprobe:migrate_pages { @in_migrate_pages++; }
> kretprobe:migrate_pages  { @in_migrate_pages--; }
> kprobe:should_fail_alloc_page / @in_migrate_pages > 0 / {
>         override(1);
> }
> -------------------------------------------------
>
> The call chain of error injection is specified via the first 2 lines.  I
> copied the methods used in BCC inject script.  Is there any better
> method to specify the call chain?
>
> We can inject error only for THP page allocation too.
>
> --------------------ENOMEM THP-------------------
> kprobe:migrate_pages { @in_migrate_pages++; }
> kretprobe:migrate_pages  { @in_migrate_pages--; }
> kprobe:should_fail_alloc_page / @in_migrate_pages > 0 && arg1 == 9 / {
>         override(1);
> }
> -------------------------------------------------
>
> Use some hack to override any function
> ======================================
>
> The in-kernel BPF based error injection mechanism can only override
> function return value for the functions in the whitelist, that is,
> functions marked with ALLOW_ERROR_INJECTION().  That is quite limited.
> The thorough error path testing needs to override the return value of
> arbitrary function.  So, I use a simple hack patch as follows for that.
>
> -----------------------8<---------------------------------
> From 3bcd401a3731bc7316d222501070a2a71fdf7170 Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@xxxxxxxxx>
> Date: Tue, 20 Sep 2022 09:08:25 +0800
> Subject: [PATCH] dbg: allow override any function with bpf_error_injection
>
> ---
>  lib/error-inject.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/lib/error-inject.c b/lib/error-inject.c
> index 1afca1b1cdea..82a402e0f15c 100644
> --- a/lib/error-inject.c
> +++ b/lib/error-inject.c
> @@ -21,6 +21,7 @@ struct ei_entry {
>         void *priv;
>  };
>
> +#if 0
>  bool within_error_injection_list(unsigned long addr)
>  {
>         struct ei_entry *ent;
> @@ -36,6 +37,12 @@ bool within_error_injection_list(unsigned long addr)
>         mutex_unlock(&ei_mutex);
>         return ret;
>  }
> +#else
> +bool within_error_injection_list(unsigned long addr)
> +{
> +       return true;
> +}
> +#endif
>
>  int get_injectable_error_type(unsigned long addr)
>  {
> --
> 2.35.1
> ----------------------------------------------------------
>
> With this debug patch, most error path can be tested.  For example,
>
> --------------------ENOSYS THP + EAGAIN----------
> #include <linux/mm.h>
> kprobe:migrate_pages { @in_migrate_pages++; }
> kretprobe:migrate_pages  { @in_migrate_pages--; }
> kprobe:unmap_and_move / @in_migrate_pages > 0 / {
>         if (((struct page *)arg3)->flags & (1 << PG_head)) {
>                 override(-38);
>         } else {
>                 override(-11);
>         }
> }
> -------------------------------------------------
>
> With this, unmap_and_move() will return -ENOSYS (-38) for THP, and
> -EAGAIN (-11) for normal page.  This can be used to test the
> corresponding error path in migrate_pages().
>
> I think that it's quite common for developers to inject error for
> arbitrary function to test the error path.  Is it a good idea to turn on
> the arbitrary error injection if a special kernel configuration
> (e.g. CONFIG_BPF_KPROBE_OVERRIDE_ANY_FUNCTION) is enabled for debugging
> purpose only?
>
> Some hacks are still necessary for complete coverage
> ====================================================
>
> Even if we can override the return value of any function.  Some hacks
> are still necessary for complete coverage.  For example, some functions
> may be inlined, if we want to override its return value, we need to mark
> it with "noinline".  And some error cannot be injected with return value
> overridden directly.  For example, if we want to test when THP split
> isn't allowed condition in migrate_pages().  Then, some hack patch need
> to be used to do that.  For example, the below patch can do that.
>
> -----------------------8<---------------------------------
> From ca371806dc7f96148cbdf03fdf8f92309306a325 Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@xxxxxxxxx>
> Date: Tue, 20 Sep 2022 09:37:53 +0800
> Subject: [PATCH] dbg: inject nosplit
>
> ---
>  mm/migrate.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 571d8c9fd5bc..d4ee76c285b2 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -57,6 +57,11 @@
>
>  #include "internal.h"
>
> +static noinline bool error_inject_nosplit(void)
> +{
> +       return false;
> +}
> +
>  int isolate_movable_page(struct page *page, isolate_mode_t mode)
>  {
>         const struct movable_operations *mops;
> @@ -1412,6 +1417,9 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>         bool nosplit = (reason == MR_NUMA_MISPLACED);
>         bool no_subpage_counting = false;
>
> +       if (error_inject_nosplit())
> +               nosplit = true;
> +
>         trace_mm_migrate_pages_start(mode, reason);
>
>  thp_subpage_migration:
> --
> 2.35.1
> ----------------------------------------------------------
>
> With the help of the above patch, the following bpftrace script can
> inject the expected error,
>
> --------------------ENOMEM THP + nosplit---------
> kprobe:migrate_pages { @in_migrate_pages++; }
> kretprobe:migrate_pages  { @in_migrate_pages--; }
> kprobe:should_fail_alloc_page / @in_migrate_pages > 0 && arg1 == 9 / {
>         override(1);
> }
> kprobe:error_inject_nosplit / @in_migrate_pages > 0 / {
>         override(1);
> }
> -------------------------------------------------
>
> Although some hack patches are needed.  This is still simpler than my
> original hand-made error injection solution.  So I will recommend
> developers to use it in the error path testing in the future.
>
> Best Regards,
> Huang, Ying




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux