Re: [PATCH v2] panic: Taint kernel if fault injection has been used

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 6 Dec 2022 20:01:46 -0800
Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:


> >   
> > > At this point crash dump might be necessary to debug.  
> > 
> > Yes. So the TAINT flag can help. Please consider that the TAINT flag
> > doesn't mean you are guilty, but this is just a hint for debugging.
> > (good for the first triage)  
> 
> I think you misunderstand the reason behind 'tainted' flags.
> It's 'hint for debugging' only on the surface.
> See Documentation/admin-guide/tainted-kernels.rst
> "... That's why bug reports
> from tainted kernels will often be ignored by developers, hence try to reproduce
> problems with an untainted kernel."

You conveniently left out the first part of that paragraph. Showing just a
portion of a statement can be very misleading. Let me add the whole
paragraph here:

   The kernel will mark itself as 'tainted' when something occurs that
   might be relevant later when investigating problems. Don't worry too much about this,
   most of the time it's not a problem to run a tainted kernel; the information is
   mainly of interest once someone wants to investigate some problem, as its real
   cause might be the event that got the kernel tainted. That's why bug reports
   from tainted kernels will often be ignored by developers, hence try to reproduce
   problems with an untainted kernel.

Let me stress the very first sentence above:

   The kernel will mark itself as 'tainted' when something occurs that
   might be relevant later when investigating problems.

I think you are the one that is misunderstanding what a taint is. It most
definitely is about giving hints for debugging. That's why the very first
sentence of that paragraph, as well as the entire document, explicitly
states "might be relevant later when investigating problems".


> 
> When 'error injection' finds a kernel bug the kernel developers need to
> look into it regardless whether it's syzbot error injection
> or whatever other mechanism.
> 

And this is a very useful taint. Just like:

  2  _/S       4  kernel running on an out of specification system
  5  _/B      32  bad page referenced or some unexpected page flags
  7  _/D     128  kernel died recently, i.e. there was an OOPS or BUG
 10  _/C    1024  staging driver was loaded
 11  _/I    2048  workaround for bug in platform firmware applied
 14  _/L   16384  soft lockup occurred 
 17  _/T  131072  kernel was built with the struct randomization plugin

Any of the above should not be ignored by developers, but they are useful
hints for debugging the issue.


> To change the topic to something ... else...
> 
> We've just hit this panic using rethook.
> [   49.235708] ==================================================================
> [   49.236243] BUG: KASAN: use-after-free in rethook_try_get+0x7e/0x380
> [   49.236693] Read of size 8 at addr ffff888102e62c88 by task test_progs/1688
> [   49.240398]  kasan_report+0x90/0x190
> [   49.240934]  rethook_try_get+0x7e/0x380
> [   49.244885]  fprobe_handler.part.1+0x119/0x1f0
> [   49.245505]  arch_ftrace_ops_list_func+0x17d/0x1d0
> [   49.246544]  ftrace_regs_call+0x5/0x52
> [   49.247411]  bpf_fentry_test1+0x5/0x10
> 
> [   49.262578] Allocated by task 1692:
> [   49.262804]  kasan_save_stack+0x1c/0x40
> [   49.263059]  kasan_set_track+0x21/0x30
> [   49.263335]  __kasan_kmalloc+0x7a/0x90
> [   49.263624]  rethook_alloc+0x2c/0xa0
> [   49.263879]  fprobe_init_rethook+0x6d/0x170
> [   49.264154]  register_fprobe_ips+0xae/0x130
> 
> [   49.265938] Freed by task 0:
> [   49.266153]  kasan_save_stack+0x1c/0x40
> [   49.266440]  kasan_set_track+0x21/0x30
> [   49.266705]  kasan_save_free_info+0x26/0x40
> [   49.266995]  __kasan_slab_free+0x103/0x190
> [   49.267282]  __kmem_cache_free+0x1b7/0x3a0
> [   49.267559]  rcu_core+0x4d8/0xd50
> 
> [   49.268181] Last potentially related work creation:
> [   49.268565]  kasan_save_stack+0x1c/0x40
> [   49.268898]  __kasan_record_aux_stack+0xa1/0xb0
> [   49.269260]  call_rcu+0x47/0x360
> [   49.269526]  unregister_fprobe+0x47/0x80
> 
> [   49.281382] general protection fault, probably for non-canonical address 0x57e006e00000000: 0000 [#1] PREEMPT SMP KASAN
> [   49.282226] CPU: 6 PID: 1688 Comm: test_progs Tainted: G    B      O       6.1.0-rc7-01508-gf0c5a2d9f234 #4343
> [   49.283751] RIP: 0010:rethook_trampoline_handler+0xff/0x1d0
> [   49.289900] Call Trace:
> [   49.290083]  <TASK>
> [   49.290248]  arch_rethook_trampoline_callback+0x6c/0xa0
> [   49.290631]  arch_rethook_trampoline+0x2c/0x50
> [   49.290964]  ? lock_release+0xad/0x3f0
> [   49.291245]  ? bpf_prog_test_run_tracing+0x235/0x380
> [   49.291609]  trace_clock_x86_tsc+0x10/0x10
> 
> This is just running bpf selftests in parallel mode on 16-cpu VM on bpf-next.
> Notice 'Tained' flags.
> Please take a look.
> 

"G - Proprietary module" - "O - out of tree module"

Can you reproduce this without those taints?

-- Steve



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux