Re: [PATCH bpf-next] bpf: Reject struct_ops registration that uses module ptr and the module btf_id is missing

Eduard Zingerman <eddyz87@xxxxxxxxx> · Thu, 02 Jan 2025 22:14:11 -0800

On Fri, 2024-12-20 at 12:18 -0800, Martin KaFai Lau wrote:
> From: Martin KaFai Lau <martin.lau@xxxxxxxxxx>
> 
> There is a UAF report in the bpf_struct_ops when CONFIG_MODULES=n.
> In particular, the report is on tcp_congestion_ops that has
> a "struct module *owner" member.
> 
> For struct_ops that has a "struct module *owner" member,
> it can be extended either by the regular kernel module or
> by the bpf_struct_ops. bpf_try_module_get() will be used
> to do the refcounting and different refcount is done
> based on the owner pointer. When CONFIG_MODULES=n,
> the btf_id of the "struct module" is missing:
> 
> WARN: resolve_btfids: unresolved symbol module
> 
> Thus, the bpf_try_module_get() cannot do the correct refcounting.
> 
> Not all subsystem's struct_ops requires the "struct module *owner" member.
> e.g. the recent sched_ext_ops.
> 
> This patch is to disable bpf_struct_ops registration if
> the struct_ops has the "struct module *" member and the
> "struct module" btf_id is missing. The btf_type_is_fwd() helper
> is moved to the btf.h header file for this test.
> 
> This has happened since the beginning of bpf_struct_ops which has gone
> through many changes. The Fixes tag is set to a recent commit that this
> patch can apply cleanly. Considering CONFIG_MODULES=n is not
> common and the age of the issue, targeting for bpf-next also.
> 
> Fixes: 1611603537a4 ("bpf: Create argument information for nullable arguments.")
> Reported-by: Robert Morris <rtm@xxxxxxxxxxxxx>
> Closes: https://lore.kernel.org/bpf/74665.1733669976@localhost/
> Signed-off-by: Martin KaFai Lau <martin.lau@xxxxxxxxxx>
> ---

Looks like this fix had not landed yet.
I tried it and id does fix the error reported in the "closes" link.

Tested-by: Eduard Zingerman <eddyz87@xxxxxxxxx>

It was a bit hard for me to figure out what went wrong from the description,
could you please double-check my understanding below?
- when struct_ops program is attached,
  bpf_struct_ops_map_update_elem() scans every member of specific
  struct_ops type (e.g. struct tcp_congestion_ops) looking for fields
  with type 'struct module *';
- to find these fields BTF id of 'struct module' is used, this id does
  not exist when CONFIG_MODULES=n, bpf_struct_ops_map_update_elem()
  does not check if 'struct module' BTF id is non-zero;
- bpf_struct_ops_map_update_elem() initializes 'struct module *'
  fields using a magic value BPF_MODULE_OWNER, this initialization
  would not happen if fields are not found;
- later bpf_try_module_get() is called by code specific to particular
  struct_ops, e.g. from tcp_cong.c:tcp_assign_congestion_control();
- the bpf_try_module_get() is implemented as follows:

    static inline bool bpf_try_module_get(const void *data, struct module *owner)
    {
    	if (owner == BPF_MODULE_OWNER)
    		return bpf_struct_ops_get(data);
    	else
    		return try_module_get(owner);
    }

  if 'struct module *' fields are not correctly initialized as BPF_MODULE_OWNER
  the bpf_try_module_get() executes try_module_get() passing a bogus pointer to it.

Assuming the above is correct, the fix lgtm.

Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx>

[...]