Re: [bug] kernel: bpf: syscall: a possible sleep-in-atomic bug in __bpf_prog_put()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 5/21/23 6:39 AM, Teng Qi wrote:
Thank you.

 > Your above analysis makes sense if indeed that kvfree cannot appear
 > inside a spin lock region or RCU read lock region. But is it true?
 > I checked a few code paths in kvfree/kfree. It is either guarded
 > with local_irq_save/restore or by
 > spin_lock_irqsave/spin_unlock_
 > irqrestore, etc. Did I miss
 > anything? Are you talking about RT kernel here?

To see the sleepable possibility of kvfree, it is important to analyze the
following calling stack:
mm/util.c: 645 kvfree()
mm/vmalloc.c: 2763 vfree()

In kvfree(), to call vfree, if the pointer addr points to memory allocated by
vmalloc(), it calls vfree().
void kvfree(const void *addr)
{
         if (is_vmalloc_addr(addr))
                 vfree(addr);
         else
                 kfree(addr);
}

In vfree(), in_interrupt() and might_sleep() need to be considered.
void vfree(const void *addr)
{
         // ...
         if (unlikely(in_interrupt()))
         {
                 vfree_atomic(addr);
                 return;
         }
         // ...
         might_sleep();
         // ...
}

Sorry. I didn't check vfree path. So it does look like that
we need to pay special attention to non interrupt part.


The vfree() may sleep if in_interrupt() == false. The RCU read lock region
could have in_interrupt() == false and spin lock region which only disables
preemption also has in_interrupt() == false. So the kvfree() cannot appear
inside a spin lock region or RCU read lock region if the pointer addr points
to memory allocated by vmalloc().

 > > Therefore, we propose modifying the condition to include
 > > in_atomic(). Could we
 > > update the condition as follows: "in_irq() || irqs_disabled() ||
 > > in_atomic()"?
 > Thank you! We look forward to your feedback.

We now think that ‘irqs_disabled() || in_atomic() || rcu_read_lock_held()’ is
more proper. irqs_disabled() is for irq flag reg, in_atomic() is for
preempt count and rcu_read_lock_held() is for RCU read lock region.

We cannot use rcu_read_lock_held() in the 'if' statement. The return
value rcu_read_lock_held() could be 1 for some configuraitons regardless
whether rcu_read_lock() is really held or not. In most cases,
rcu_read_lock_held() is used in issuing potential warnings.
Maybe there are other ways to record whether rcu_read_lock() is held or not?

I agree with your that 'irqs_disabled() || in_atomic()' makes sense
since it covers process context local_irq_save() and spin_lock() cases.

If we cannot resolve rcu_read_lock() presence issue, maybe the condition
can be !in_interrupt(), so any process-context will go to a workqueue.

Alternatively, we could have another solution. We could add another
function e.g., bpf_prog_put_rcu(), which indicates that bpf_prog_put()
will be done in rcu context. So if in_interrupt(), do kvfree, otherwise,
put into a workqueue.



-- Teng Qi

On Sun, May 21, 2023 at 11:45 AM Yonghong Song <yhs@xxxxxxxx <mailto:yhs@xxxxxxxx>> wrote:



    On 5/19/23 7:18 AM, Teng Qi wrote:
     > Thank you for your response.
     >  > Looks like you only have suspicion here. Could you find a real
    violation
     >  > here where __bpf_prog_put() is called with !in_irq() &&
     >  > !irqs_disabled(), but inside spin_lock or rcu read lock? I
    have not seen
     >  > things like that.
     >
     > For the complex conditions to call bpf_prog_put() with 1 refcnt,
    we have
     > been
     > unable to really trigger this atomic violation after trying to
    construct
     > test cases manually. But we found that it is possible to show
    cases with
     > !in_irq() && !irqs_disabled(), but inside spin_lock or rcu read lock.
     > For example, even a failed case, one of selftest cases of bpf,
    netns_cookie,
     > calls bpf_sock_map_update() and may indirectly call bpf_prog_put()
     > only inside rcu read lock: The possible call stack is:
     > net/core/sock_map.c: 615 bpf_sock_map_update()
     > net/core/sock_map.c: 468 sock_map_update_common()
     > net/core/sock_map.c:  217 sock_map_link()
     > kernel/bpf/syscall.c: 2111 bpf_prog_put()
     >
     > The files about netns_cookie include
     > tools/testing/selftests/bpf/progs/netns_cookie_prog.c and
     > tools/testing/selftests/bpf/prog_tests/netns_cookie.c. We
    inserted the
     > following code in
     > ‘net/core/sock_map.c: 468 sock_map_update_common()’:
     > static int sock_map_update_common(..)
     > {
     >          int inIrq = in_irq();
     >          int irqsDisabled = irqs_disabled();
     >          int preemptBits = preempt_count();
     >          int inAtomic = in_atomic();
     >          int rcuHeld = rcu_read_lock_held();
     >          printk("in_irq() %d, irqs_disabled() %d, preempt_count() %d,
     >            in_atomic() %d, rcu_read_lock_held() %d", inIrq,
    irqsDisabled,
     >            preemptBits, inAtomic, rcuHeld);
     > }
     >
     > The output message is as follows:
     > root@(none):/root/bpf# ./test_progs -t netns_cookie
     > [  137.639188] in_irq() 0, irqs_disabled() 0, preempt_count() 0,
     > in_atomic() 0,
     >          rcu_read_lock_held() 1
     > #113     netns_cookie:OK
     > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
     >
     > We notice that there are numerous callers in kernel/, net/ and
    drivers/,
     > so we
     > highly suggest modifying __bpf_prog_put() to address this gap.
    The gap
     > exists
     > because __bpf_prog_put() is only safe under in_irq() ||
    irqs_disabled()
     > but not in_atomic() || rcu_read_lock_held(). The following code
    snippet may
     > mislead developers into thinking that bpf_prog_put() is safe in all
     > contexts.
     > if (in_irq() || irqs_disabled()) {
     >          INIT_WORK(&aux->work, bpf_prog_put_deferred);
     >          schedule_work(&aux->work);
     > } else {
     >          bpf_prog_put_deferred(&aux->work);
     > }
     >
     > Implicit dependency may lead to issues.
     >
     >  > Any problem here?
     > We mentioned it to demonstrate the possibility of kvfree() being
     > called by __bpf_prog_put_noref().
     >
     > Thanks.
     > -- Teng Qi
     >
     > On Wed, May 17, 2023 at 1:08 AM Yonghong Song <yhs@xxxxxxxx
    <mailto:yhs@xxxxxxxx>
     > <mailto:yhs@xxxxxxxx <mailto:yhs@xxxxxxxx>>> wrote:
     >
     >
     >
     >     On 5/16/23 4:18 AM, starmiku1207184332@xxxxxxxxx
    <mailto:starmiku1207184332@xxxxxxxxx>
     >     <mailto:starmiku1207184332@xxxxxxxxx
    <mailto:starmiku1207184332@xxxxxxxxx>> wrote:
     >      > From: Teng Qi <starmiku1207184332@xxxxxxxxx
    <mailto:starmiku1207184332@xxxxxxxxx>
     >     <mailto:starmiku1207184332@xxxxxxxxx
    <mailto:starmiku1207184332@xxxxxxxxx>>>
     >      >
     >      > Hi, bpf developers,
     >      >
     >      > We are developing a static tool to check the matching between
     >     helpers and the
     >      > context of hooks. During our analysis, we have discovered some
     >     important
     >      > findings that we would like to report.
     >      >
     >      > ‘kernel/bpf/syscall.c: 2097 __bpf_prog_put()’ shows that
    function
     >      > bpf_prog_put_deferred() won`t be called in the condition of
     >      > ‘in_irq() || irqs_disabled()’.
     >      > if (in_irq() || irqs_disabled()) {
     >      >      INIT_WORK(&aux->work, bpf_prog_put_deferred);
     >      >      schedule_work(&aux->work);
     >      > } else {
     >      >
     >      >      bpf_prog_put_deferred(&aux->work);
     >      > }
     >      >
     >      > We suspect this condition exists because there might be
    sleepable
     >     operations
     >      > in the callees of the bpf_prog_put_deferred() function:
     >      > kernel/bpf/syscall.c: 2097 __bpf_prog_put()
     >      > kernel/bpf/syscall.c: 2084 bpf_prog_put_deferred()
     >      > kernel/bpf/syscall.c: 2063 __bpf_prog_put_noref()
     >      > kvfree(prog->aux->jited_linfo);
     >      > kvfree(prog->aux->linfo);
     >
     >     Looks like you only have suspicion here. Could you find a real
     >     violation
     >     here where __bpf_prog_put() is called with !in_irq() &&
     >     !irqs_disabled(), but inside spin_lock or rcu read lock? I
    have not seen
     >     things like that.
     >
     >      >
     >      > Additionally, we found that array prog->aux->jited_linfo is
     >     initialized in
     >      > ‘kernel/bpf/core.c: 157 bpf_prog_alloc_jited_linfo()’:
     >      > prog->aux->jited_linfo = kvcalloc(prog->aux->nr_linfo,
     >      >    sizeof(*prog->aux->jited_linfo),
    bpf_memcg_flags(GFP_KERNEL |
     >     __GFP_NOWARN));
     >
     >     Any problem here?
     >
     >      >
     >      > Our question is whether the condition 'in_irq() ||
     >     irqs_disabled() == false' is
     >      > sufficient for calling 'kvfree'. We are aware that calling
     >     'kvfree' within the
     >      > context of a spin lock or an RCU lock is unsafe.

    Your above analysis makes sense if indeed that kvfree cannot appear
    inside a spin lock region or RCU read lock region. But is it true?
    I checked a few code paths in kvfree/kfree. It is either guarded
    with local_irq_save/restore or by
    spin_lock_irqsave/spin_unlock_irqrestore, etc. Did I miss
    anything? Are you talking about RT kernel here?


     >      >
     >      > Therefore, we propose modifying the condition to include
     >     in_atomic(). Could we
     >      > update the condition as follows: "in_irq() ||
    irqs_disabled() ||
     >     in_atomic()"?
     >      >
     >      > Thank you! We look forward to your feedback.
     >      >
     >      > Signed-off-by: Teng Qi <starmiku1207184332@xxxxxxxxx
    <mailto:starmiku1207184332@xxxxxxxxx>
     >     <mailto:starmiku1207184332@xxxxxxxxx
    <mailto:starmiku1207184332@xxxxxxxxx>>>
     >





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux