Hello! > BTW, please do create a test case, e.g, sockmap test case which > can show the problem with existing code base. I add a printk in bpf_prog_put_deferred(): static void bpf_prog_put_deferred(struct work_struct *work) { // . . . int inIrq = in_irq(); int irqsDisabled = irqs_disabled(); int preemptBits = preempt_count(); int inAtomic = in_atomic(); int rcuHeld = rcu_read_lock_held(); printk("bpf_prog_put: in_irq() %d, irqs_disabled() %d, preempt_count() %d, in_atomic() %d, rcu_read_lock_held() %d", inIrq, irqsDisabled, preemptBits, inAtomic, rcuHeld); // . . . } When running the selftest, I see the following output: [255340.388339] bpf_prog_put: in_irq() 0, irqs_disabled() 0, preempt_count() 256, in_atomic() 1, rcu_read_lock_held() 1 [255393.237632] bpf_prog_put: in_irq() 0, irqs_disabled() 0, preempt_count() 0, in_atomic() 0, rcu_read_lock_held() 1 Based on this output, I believe it is sufficient to construct a self-test case for bpf_prog_put_deferred() called under preempt disabled or rcu read lock region. However, I'm a bit confused about what I should do to build the self-test case. Are we looking to create a checker that verifies the context of bpf_prog_put_deferred() is valid? Or do we need a test case that can trigger this bug? Could you show me more ideas to construct a self test case? I am not familiar with it and have no idea. -- Teng Qi On Thu, May 25, 2023 at 3:34 AM Yonghong Song <yhs@xxxxxxxx> wrote: > > > > On 5/24/23 5:42 AM, Teng Qi wrote: > > Thank you. > > > >> We cannot use rcu_read_lock_held() in the 'if' statement. The return > >> value rcu_read_lock_held() could be 1 for some configurations regardless > >> whether rcu_read_lock() is really held or not. In most cases, > >> rcu_read_lock_held() is used in issuing potential warnings. > >> Maybe there are other ways to record whether rcu_read_lock() is held or not? > > > > Sorry. I was not aware of the dependency of configurations of > > rcu_read_lock_held(). > > > >> If we cannot resolve rcu_read_lock() presence issue, maybe the condition > >> can be !in_interrupt(), so any process-context will go to a workqueue. > > > > I agree that using !in_interrupt() as a condition is an acceptable solution. > > This should work although it could be conservative. > > > > >> Alternatively, we could have another solution. We could add another > >> function e.g., bpf_prog_put_rcu(), which indicates that bpf_prog_put() > >> will be done in rcu context. > > > > Implementing a new function like bpf_prog_put_rcu() is a solution that involves > > more significant changes. > > Maybe we can change signature of bpf_prog_put instead? Like > void bpf_prog_put(struct bpf_prog *prog, bool in_rcu) > and inside bpf_prog_put we can add > WARN_ON_ONCE(in_rcu && !bpf_rcu_lock_held()); > > > > >> So if in_interrupt(), do kvfree, otherwise, > >> put into a workqueue. > > > > Shall we proceed with submitting a patch following this approach? > > You could choose either of the above although I think with newer > bpf_prog_put() is better. > > BTW, please do create a test case, e.g, sockmap test case which > can show the problem with existing code base. > > > > > I would like to mention something unrelated to the possible bug. At this > > moment, things seem to be more puzzling. vfree() is safe under in_interrupt() > > but not safe under other atomic contexts. > > This disorder challenges our conventional belief, a monotonic incrementation > > of limitations of the hierarchical atomic contexts, that programer needs > > to be more and more careful to write code under rcu read lock, spin lock, > > bh disable, interrupt... > > This disorder can lead to unexpected consequences, such as code being safe > > under interrupts but not safe under spin locks. > > The disorder makes kernel programming more complex and may result in more bugs. > > Even though we find a way to resolve the possible bug about the bpf_prog_put(), > > I feel sad for undermining of kernel`s maintainability and disorder of > > hierarchy of atomic contexts. > > > > -- Teng Qi > > > > On Tue, May 23, 2023 at 12:33 PM Yonghong Song <yhs@xxxxxxxx> wrote: > >> > >> > >> > >> On 5/21/23 6:39 AM, Teng Qi wrote: > >>> Thank you. > >>> > >>> > Your above analysis makes sense if indeed that kvfree cannot appear > >>> > inside a spin lock region or RCU read lock region. But is it true? > >>> > I checked a few code paths in kvfree/kfree. It is either guarded > >>> > with local_irq_save/restore or by > >>> > spin_lock_irqsave/spin_unlock_ > >>> > irqrestore, etc. Did I miss > >>> > anything? Are you talking about RT kernel here? > >>> > >>> To see the sleepable possibility of kvfree, it is important to analyze the > >>> following calling stack: > >>> mm/util.c: 645 kvfree() > >>> mm/vmalloc.c: 2763 vfree() > >>> > >>> In kvfree(), to call vfree, if the pointer addr points to memory > >>> allocated by > >>> vmalloc(), it calls vfree(). > >>> void kvfree(const void *addr) > >>> { > >>> if (is_vmalloc_addr(addr)) > >>> vfree(addr); > >>> else > >>> kfree(addr); > >>> } > >>> > >>> In vfree(), in_interrupt() and might_sleep() need to be considered. > >>> void vfree(const void *addr) > >>> { > >>> // ... > >>> if (unlikely(in_interrupt())) > >>> { > >>> vfree_atomic(addr); > >>> return; > >>> } > >>> // ... > >>> might_sleep(); > >>> // ... > >>> } > >> > >> Sorry. I didn't check vfree path. So it does look like that > >> we need to pay special attention to non interrupt part. > >> > >>> > >>> The vfree() may sleep if in_interrupt() == false. The RCU read lock region > >>> could have in_interrupt() == false and spin lock region which only disables > >>> preemption also has in_interrupt() == false. So the kvfree() cannot appear > >>> inside a spin lock region or RCU read lock region if the pointer addr points > >>> to memory allocated by vmalloc(). > >>> > >>> > > Therefore, we propose modifying the condition to include > >>> > > in_atomic(). Could we > >>> > > update the condition as follows: "in_irq() || irqs_disabled() || > >>> > > in_atomic()"? > >>> > Thank you! We look forward to your feedback. > >>> > >>> We now think that ‘irqs_disabled() || in_atomic() || > >>> rcu_read_lock_held()’ is > >>> more proper. irqs_disabled() is for irq flag reg, in_atomic() is for > >>> preempt count and rcu_read_lock_held() is for RCU read lock region. > >> > >> We cannot use rcu_read_lock_held() in the 'if' statement. The return > >> value rcu_read_lock_held() could be 1 for some configuraitons regardless > >> whether rcu_read_lock() is really held or not. In most cases, > >> rcu_read_lock_held() is used in issuing potential warnings. > >> Maybe there are other ways to record whether rcu_read_lock() is held or not? > >> > >> I agree with your that 'irqs_disabled() || in_atomic()' makes sense > >> since it covers process context local_irq_save() and spin_lock() cases. > >> > >> If we cannot resolve rcu_read_lock() presence issue, maybe the condition > >> can be !in_interrupt(), so any process-context will go to a workqueue. > >> > >> Alternatively, we could have another solution. We could add another > >> function e.g., bpf_prog_put_rcu(), which indicates that bpf_prog_put() > >> will be done in rcu context. So if in_interrupt(), do kvfree, otherwise, > >> put into a workqueue. > >> > >> > >>> > >>> -- Teng Qi > >>> > >>> On Sun, May 21, 2023 at 11:45 AM Yonghong Song <yhs@xxxxxxxx > >>> <mailto:yhs@xxxxxxxx>> wrote: > >>> > >>> > >>> > >>> On 5/19/23 7:18 AM, Teng Qi wrote: > >>> > Thank you for your response. > >>> > > Looks like you only have suspicion here. Could you find a real > >>> violation > >>> > > here where __bpf_prog_put() is called with !in_irq() && > >>> > > !irqs_disabled(), but inside spin_lock or rcu read lock? I > >>> have not seen > >>> > > things like that. > >>> > > >>> > For the complex conditions to call bpf_prog_put() with 1 refcnt, > >>> we have > >>> > been > >>> > unable to really trigger this atomic violation after trying to > >>> construct > >>> > test cases manually. But we found that it is possible to show > >>> cases with > >>> > !in_irq() && !irqs_disabled(), but inside spin_lock or rcu read lock. > >>> > For example, even a failed case, one of selftest cases of bpf, > >>> netns_cookie, > >>> > calls bpf_sock_map_update() and may indirectly call bpf_prog_put() > >>> > only inside rcu read lock: The possible call stack is: > >>> > net/core/sock_map.c: 615 bpf_sock_map_update() > >>> > net/core/sock_map.c: 468 sock_map_update_common() > >>> > net/core/sock_map.c: 217 sock_map_link() > >>> > kernel/bpf/syscall.c: 2111 bpf_prog_put() > >>> > > >>> > The files about netns_cookie include > >>> > tools/testing/selftests/bpf/progs/netns_cookie_prog.c and > >>> > tools/testing/selftests/bpf/prog_tests/netns_cookie.c. We > >>> inserted the > >>> > following code in > >>> > ‘net/core/sock_map.c: 468 sock_map_update_common()’: > >>> > static int sock_map_update_common(..) > >>> > { > >>> > int inIrq = in_irq(); > >>> > int irqsDisabled = irqs_disabled(); > >>> > int preemptBits = preempt_count(); > >>> > int inAtomic = in_atomic(); > >>> > int rcuHeld = rcu_read_lock_held(); > >>> > printk("in_irq() %d, irqs_disabled() %d, preempt_count() %d, > >>> > in_atomic() %d, rcu_read_lock_held() %d", inIrq, > >>> irqsDisabled, > >>> > preemptBits, inAtomic, rcuHeld); > >>> > } > >>> > > >>> > The output message is as follows: > >>> > root@(none):/root/bpf# ./test_progs -t netns_cookie > >>> > [ 137.639188] in_irq() 0, irqs_disabled() 0, preempt_count() 0, > >>> > in_atomic() 0, > >>> > rcu_read_lock_held() 1 > >>> > #113 netns_cookie:OK > >>> > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED > >>> > > >>> > We notice that there are numerous callers in kernel/, net/ and > >>> drivers/, > >>> > so we > >>> > highly suggest modifying __bpf_prog_put() to address this gap. > >>> The gap > >>> > exists > >>> > because __bpf_prog_put() is only safe under in_irq() || > >>> irqs_disabled() > >>> > but not in_atomic() || rcu_read_lock_held(). The following code > >>> snippet may > >>> > mislead developers into thinking that bpf_prog_put() is safe in all > >>> > contexts. > >>> > if (in_irq() || irqs_disabled()) { > >>> > INIT_WORK(&aux->work, bpf_prog_put_deferred); > >>> > schedule_work(&aux->work); > >>> > } else { > >>> > bpf_prog_put_deferred(&aux->work); > >>> > } > >>> > > >>> > Implicit dependency may lead to issues. > >>> > > >>> > > Any problem here? > >>> > We mentioned it to demonstrate the possibility of kvfree() being > >>> > called by __bpf_prog_put_noref(). > >>> > > >>> > Thanks. > >>> > -- Teng Qi > >>> > > >>> > On Wed, May 17, 2023 at 1:08 AM Yonghong Song <yhs@xxxxxxxx > >>> <mailto:yhs@xxxxxxxx> > >>> > <mailto:yhs@xxxxxxxx <mailto:yhs@xxxxxxxx>>> wrote: > >>> > > >>> > > >>> > > >>> > On 5/16/23 4:18 AM, starmiku1207184332@xxxxxxxxx > >>> <mailto:starmiku1207184332@xxxxxxxxx> > >>> > <mailto:starmiku1207184332@xxxxxxxxx > >>> <mailto:starmiku1207184332@xxxxxxxxx>> wrote: > >>> > > From: Teng Qi <starmiku1207184332@xxxxxxxxx > >>> <mailto:starmiku1207184332@xxxxxxxxx> > >>> > <mailto:starmiku1207184332@xxxxxxxxx > >>> <mailto:starmiku1207184332@xxxxxxxxx>>> > >>> > > > >>> > > Hi, bpf developers, > >>> > > > >>> > > We are developing a static tool to check the matching between > >>> > helpers and the > >>> > > context of hooks. During our analysis, we have discovered some > >>> > important > >>> > > findings that we would like to report. > >>> > > > >>> > > ‘kernel/bpf/syscall.c: 2097 __bpf_prog_put()’ shows that > >>> function > >>> > > bpf_prog_put_deferred() won`t be called in the condition of > >>> > > ‘in_irq() || irqs_disabled()’. > >>> > > if (in_irq() || irqs_disabled()) { > >>> > > INIT_WORK(&aux->work, bpf_prog_put_deferred); > >>> > > schedule_work(&aux->work); > >>> > > } else { > >>> > > > >>> > > bpf_prog_put_deferred(&aux->work); > >>> > > } > >>> > > > >>> > > We suspect this condition exists because there might be > >>> sleepable > >>> > operations > >>> > > in the callees of the bpf_prog_put_deferred() function: > >>> > > kernel/bpf/syscall.c: 2097 __bpf_prog_put() > >>> > > kernel/bpf/syscall.c: 2084 bpf_prog_put_deferred() > >>> > > kernel/bpf/syscall.c: 2063 __bpf_prog_put_noref() > >>> > > kvfree(prog->aux->jited_linfo); > >>> > > kvfree(prog->aux->linfo); > >>> > > >>> > Looks like you only have suspicion here. Could you find a real > >>> > violation > >>> > here where __bpf_prog_put() is called with !in_irq() && > >>> > !irqs_disabled(), but inside spin_lock or rcu read lock? I > >>> have not seen > >>> > things like that. > >>> > > >>> > > > >>> > > Additionally, we found that array prog->aux->jited_linfo is > >>> > initialized in > >>> > > ‘kernel/bpf/core.c: 157 bpf_prog_alloc_jited_linfo()’: > >>> > > prog->aux->jited_linfo = kvcalloc(prog->aux->nr_linfo, > >>> > > sizeof(*prog->aux->jited_linfo), > >>> bpf_memcg_flags(GFP_KERNEL | > >>> > __GFP_NOWARN)); > >>> > > >>> > Any problem here? > >>> > > >>> > > > >>> > > Our question is whether the condition 'in_irq() || > >>> > irqs_disabled() == false' is > >>> > > sufficient for calling 'kvfree'. We are aware that calling > >>> > 'kvfree' within the > >>> > > context of a spin lock or an RCU lock is unsafe. > >>> > >>> Your above analysis makes sense if indeed that kvfree cannot appear > >>> inside a spin lock region or RCU read lock region. But is it true? > >>> I checked a few code paths in kvfree/kfree. It is either guarded > >>> with local_irq_save/restore or by > >>> spin_lock_irqsave/spin_unlock_irqrestore, etc. Did I miss > >>> anything? Are you talking about RT kernel here? > >>> > >>> > >>> > > > >>> > > Therefore, we propose modifying the condition to include > >>> > in_atomic(). Could we > >>> > > update the condition as follows: "in_irq() || > >>> irqs_disabled() || > >>> > in_atomic()"? > >>> > > > >>> > > Thank you! We look forward to your feedback. > >>> > > > >>> > > Signed-off-by: Teng Qi <starmiku1207184332@xxxxxxxxx > >>> <mailto:starmiku1207184332@xxxxxxxxx> > >>> > <mailto:starmiku1207184332@xxxxxxxxx > >>> <mailto:starmiku1207184332@xxxxxxxxx>>> > >>> > > >>>