Re: [PATCH bpf-next v6 1/2] bpf: Fix attaching fentry/fexit/fmod_ret/lsm to modules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2023/4/1 5:25, Jiri Olsa wrote:
> On Fri, Mar 31, 2023 at 01:08:42PM +0200, Petr Mladek wrote:
>> On Fri 2023-03-31 17:15:56, Leizhen (ThunderTown) wrote:
>>>
>>>
>>> On 2023/3/31 16:31, Petr Mladek wrote:
>>>> On Thu 2023-03-30 22:59:12, Jiri Olsa wrote:
>>>>> On Thu, Mar 30, 2023 at 08:26:41PM +0800, Leizhen (ThunderTown) wrote:
>>>>>>
>>>>>>
>>>>>> On 2023/3/30 15:29, Jiri Olsa wrote:
>>>>>>> ping,
>>>>>>>
>>>>>>> Petr, Zhen, any comment on discussion below?
>>>>>>>
>>>>>>> thanks,
>>>>>>> jirka
>>>>>>>
>>>>>>> On Thu, Mar 23, 2023 at 03:00:25PM +0100, Jiri Olsa wrote:
>>>>>>>> On Wed, Mar 22, 2023 at 09:03:46AM -0700, Alexei Starovoitov wrote:
>>>>>>>>> On Wed, Mar 22, 2023 at 5:14 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 22, 2023 at 10:49:38AM +0100, Artem Savkov wrote:
>>>>>>>>>>
>>>>>>>>>> SNIP
>>>>>>>>>>
>>>>>>>>>>>>> Hm, do we even need to preempt_disable? IIUC, preempt_disable is used
>>>>>>>>>>>>> in module kallsyms to prevent taking the module lock b/c kallsyms are
>>>>>>>>>>>>> used in the oops path. That shouldn't be an issue here, is that correct?
>>>>>>>>>>>>
>>>>>>>>>>>> btf_try_get_module calls try_module_get which disables the preemption,
>>>>>>>>>>>> so no need to call it in here
>>>>>>>>>>>
>>>>>>>>>>> It does, but it reenables preemption right away so it is enabled by the
>>>>>>>>>>> time we call find_kallsyms_symbol_value(). I am getting the following
>>>>>>>>>>> lockdep splat while running module_fentry_shadow test from test_progs.
>>>>>>>>>>>
>>>>>>>>>>> [   12.017973][  T488] =============================
>>>>>>>>>>> [   12.018529][  T488] WARNING: suspicious RCU usage
>>>>>>>>>>> [   12.018987][  T488] 6.2.0.bpf-test-13063-g6a9f5cdba3c5 #804 Tainted: G           OE
>>>>>>>>>>> [   12.019898][  T488] -----------------------------
>>>>>>>>>>> [   12.020391][  T488] kernel/module/kallsyms.c:448 suspicious rcu_dereference_check() usage!
>>>>>>>>>>> [   12.021335][  T488]
>>>>>>>>>>> [   12.021335][  T488] other info that might help us debug this:
>>>>>>>>>>> [   12.021335][  T488]
>>>>>>>>>>> [   12.022416][  T488]
>>>>>>>>>>> [   12.022416][  T488] rcu_scheduler_active = 2, debug_locks = 1
>>>>>>>>>>> [   12.023297][  T488] no locks held by test_progs/488.
>>>>>>>>>>> [   12.023854][  T488]
>>>>>>>>>>> [   12.023854][  T488] stack backtrace:
>>>>>>>>>>> [   12.024336][  T488] CPU: 0 PID: 488 Comm: test_progs Tainted: G           OE      6.2.0.bpf-test-13063-g6a9f5cdba3c5 #804
>>>>>>>>>>> [   12.025290][  T488] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
>>>>>>>>>>> [   12.026108][  T488] Call Trace:
>>>>>>>>>>> [   12.026381][  T488]  <TASK>
>>>>>>>>>>> [   12.026649][  T488]  dump_stack_lvl+0xb4/0x110
>>>>>>>>>>> [   12.027060][  T488]  lockdep_rcu_suspicious+0x158/0x1f0
>>>>>>>>>>> [   12.027541][  T488]  find_kallsyms_symbol_value+0xe8/0x110
>>>>>>>>>>> [   12.028028][  T488]  bpf_check_attach_target+0x838/0xa20
>>>>>>>>>>> [   12.028511][  T488]  check_attach_btf_id+0x144/0x3f0
>>>>>>>>>>> [   12.028957][  T488]  ? __pfx_cmp_subprogs+0x10/0x10
>>>>>>>>>>> [   12.029408][  T488]  bpf_check+0xeec/0x1850
>>>>>>>>>>> [   12.029799][  T488]  ? ktime_get_with_offset+0x124/0x1d0
>>>>>>>>>>> [   12.030247][  T488]  bpf_prog_load+0x87a/0xed0
>>>>>>>>>>> [   12.030627][  T488]  ? __lock_release+0x5f/0x160
>>>>>>>>>>> [   12.031010][  T488]  ? __might_fault+0x53/0xb0
>>>>>>>>>>> [   12.031394][  T488]  ? selinux_bpf+0x6c/0xa0
>>>>>>>>>>> [   12.031756][  T488]  __sys_bpf+0x53c/0x1240
>>>>>>>>>>> [   12.032115][  T488]  __x64_sys_bpf+0x27/0x40
>>>>>>>>>>> [   12.032476][  T488]  do_syscall_64+0x3e/0x90
>>>>>>>>>>> [   12.032835][  T488]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>>>>>>>>>
>>>>>>>>>> --- a/kernel/module/kallsyms.c
>>>>>>>>>> +++ b/kernel/module/kallsyms.c
>>>>>> Commit 91fb02f31505 ("module: Move kallsyms support into a separate file") hides
>>>>>> the answer. find_kallsyms_symbol_value() was originally a static function, and it
>>>>>> is only called by module_kallsyms_lookup_name() and is preemptive-protected.
>>>>>>
>>>>>> Now that we've added a call to function find_kallsyms_symbol_value(), it seems like
>>>>>> we should do the same thing as function module_kallsyms_lookup_name().
>>>>>>
>>>>>> Like this?
>>>>>> +				mod = btf_try_get_module(btf);
>>>>>> +				if (mod) {
>>>>>> +					preempt_disable();
>>>>>> +					addr = find_kallsyms_symbol_value(mod, tname);
>>>>>> +					preempt_enable();
>>>>>> +				} else
>>>>>> +					addr = 0;
>>>>>
>>>>> yes, that's what I did above, but I was just curious about the strange
>>>>> RCU usage Alexei commented on earlier:
>>>>>
>>>>> 	>>> +unsigned long find_kallsyms_symbol_value(struct module *mod, const char *name)
>>>>> 	>>> +{
>>>>> 	>>> +       unsigned long ret;
>>>>> 	>>> +
>>>>> 	>>> +       preempt_disable();
>>>>> 	>>> +       ret = __find_kallsyms_symbol_value(mod, name);
>>>>> 	>>> +       preempt_enable();
>>>>> 	>>> +       return ret;
>>>>> 	>>> +}
>>>>> 	>>
>>>>> 	>> That doesn't look right.
>>>>> 	>> I think the issue is misuse of rcu_dereference_sched in
>>>>> 	>> find_kallsyms_symbol_value.
>>>>> 	>
>>>>> 	> it seems to be using rcu pointer to keep symbols for module init time and
>>>>> 	> then core symbols for after init.. and switch between them when module is
>>>>> 	> loaded, hence the strange rcu usage I think
>>>
>>> load_module
>>> 	post_relocation
>>> 		add_kallsyms
>>> 			mod->kallsyms = (void __rcu *)mod->init_layout.base + info->mod_kallsyms_init_off;   (1)
>>> 	do_init_module
>>> 		freeinit->module_init = mod->init_layout.base;
>>> 		rcu_assign_pointer(mod->kallsyms, &mod->core_kallsyms);                                      (2)
>>> 		if (llist_add(&freeinit->node, &init_free_list))
>>> 			schedule_work(&init_free_wq);
>>>
>>> do_free_init
>>> 	synchronize_rcu();
>>> 	module_memfree(initfree->module_init);
>>>
>>> IIUC, the RCU can help synchronize_rcu() in do_free_init() to make sure that no one
>>> is still using the first mod->kallsyms (1). If find_kallsyms_symbol_value() is executed
>>> between (1) and (2).
>>
>> Yes, this seems to be another scenario where the RCU synchronization/access
>> is needed.
> 
> thanks for the details
> 
> still curious.. confusing part for me is the use of rcu_dereference in
> add_kallsyms IIUC there's no need for that because mod->kallsyms is not
> exposed at that time? we could do without it like in patch below?

Looks good to me. Prepare the data first, and then pointer to it.
This is the standard operation procedure of the RCU. But there
may be special considerations that we don't know about.


> 
> thanks,
> jirka
> 
> 
> ---
> diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c
> index bdc911dbcde5..bc1e748a1357 100644
> --- a/kernel/module/kallsyms.c
> +++ b/kernel/module/kallsyms.c
> @@ -170,20 +170,18 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>  	Elf_Sym *dst;
>  	char *s;
>  	Elf_Shdr *symsec = &info->sechdrs[info->index.sym];
> +	struct mod_kallsyms *kallsyms;
>  	unsigned long strtab_size;
>  
> -	/* Set up to point into init section. */
> -	mod->kallsyms = (void __rcu *)mod->init_layout.base +
> -		info->mod_kallsyms_init_off;
> +	kallsyms = mod->init_layout.base + info->mod_kallsyms_init_off;
>  
> -	rcu_read_lock();
>  	/* The following is safe since this pointer cannot change */
> -	rcu_dereference(mod->kallsyms)->symtab = (void *)symsec->sh_addr;
> -	rcu_dereference(mod->kallsyms)->num_symtab = symsec->sh_size / sizeof(Elf_Sym);
> +	kallsyms->symtab = (void *)symsec->sh_addr;
> +	kallsyms->num_symtab = symsec->sh_size / sizeof(Elf_Sym);
>  	/* Make sure we get permanent strtab: don't use info->strtab. */
> -	rcu_dereference(mod->kallsyms)->strtab =
> +	kallsyms->strtab =
>  		(void *)info->sechdrs[info->index.str].sh_addr;
> -	rcu_dereference(mod->kallsyms)->typetab = mod->init_layout.base + info->init_typeoffs;
> +	kallsyms->typetab = mod->init_layout.base + info->init_typeoffs;
>  
>  	/*
>  	 * Now populate the cut down core kallsyms for after init
> @@ -193,20 +191,20 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>  	mod->core_kallsyms.strtab = s = mod->data_layout.base + info->stroffs;
>  	mod->core_kallsyms.typetab = mod->data_layout.base + info->core_typeoffs;
>  	strtab_size = info->core_typeoffs - info->stroffs;
> -	src = rcu_dereference(mod->kallsyms)->symtab;
> -	for (ndst = i = 0; i < rcu_dereference(mod->kallsyms)->num_symtab; i++) {
> -		rcu_dereference(mod->kallsyms)->typetab[i] = elf_type(src + i, info);
> +	src = kallsyms->symtab;
> +	for (ndst = i = 0; i < kallsyms->num_symtab; i++) {
> +		kallsyms->typetab[i] = elf_type(src + i, info);
>  		if (i == 0 || is_livepatch_module(mod) ||
>  		    is_core_symbol(src + i, info->sechdrs, info->hdr->e_shnum,
>  				   info->index.pcpu)) {
>  			ssize_t ret;
>  
>  			mod->core_kallsyms.typetab[ndst] =
> -			    rcu_dereference(mod->kallsyms)->typetab[i];
> +			    kallsyms->typetab[i];
>  			dst[ndst] = src[i];
>  			dst[ndst++].st_name = s - mod->core_kallsyms.strtab;
>  			ret = strscpy(s,
> -				      &rcu_dereference(mod->kallsyms)->strtab[src[i].st_name],
> +				      &kallsyms->strtab[src[i].st_name],
>  				      strtab_size);
>  			if (ret < 0)
>  				break;
> @@ -214,8 +212,10 @@ void add_kallsyms(struct module *mod, const struct load_info *info)
>  			strtab_size -= ret + 1;
>  		}
>  	}
> -	rcu_read_unlock();
>  	mod->core_kallsyms.num_symtab = ndst;
> +
> +	/* Set up to point into init section. */
> +	rcu_assign_pointer(mod->kallsyms, kallsyms);
>  }
>  
>  #if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
> .
> 

-- 
Regards,
  Zhen Lei



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux