Hello! > It would be great if you also print out in_interrupt() value, so we know > whether softirq or nmi is enabled or not. After adding the in_interrupt(), the interesting output cases are as follows: [ 38.596580] bpf_prog_put: in_irq() 0, irqs_disabled() 0, in_interrupt() 256, preempt_count() 256, in_atomic() 1, rcu_read_lock_held() 0 [ 62.300608] bpf_prog_put: in_irq() 0, irqs_disabled() 0, in_interrupt() 256, preempt_count() 256, in_atomic() 1, rcu_read_lock_held() 1 [ 62.301179] bpf_prog_put: in_irq() 0, irqs_disabled() 0, in_interrupt() 0, preempt_count() 0, in_atomic() 0, rcu_read_lock_held() 1 Based on these cases, the current code is safe for the first two cases, because in_interrupt() in vfree() prevents sleeping. However, the rcu_read_lock_held() is not reliable, so we cannot rely on it. Considering all the discussions so far, I think the best plan now is to change the condition in __bpf_prog_put() to ‘irqs_disabled() || in_atomic()’ and provide examples for possible issues. This plan effectively addresses more possible atomic contexts of __bpf_prog_put() without incurring any additional cost. -- Teng Qi On Mon, Jun 12, 2023 at 8:02 AM Yonghong Song <yhs@xxxxxxxx> wrote: > > > > On 6/11/23 6:02 AM, Teng Qi wrote: > > Hello! > >> BTW, please do create a test case, e.g, sockmap test case which > >> can show the problem with existing code base. > > > > I add a printk in bpf_prog_put_deferred(): > > static void bpf_prog_put_deferred(struct work_struct *work) > > { > > // . . . > > int inIrq = in_irq(); > > int irqsDisabled = irqs_disabled(); > > int preemptBits = preempt_count(); > > int inAtomic = in_atomic(); > > int rcuHeld = rcu_read_lock_held(); > > printk("bpf_prog_put: in_irq() %d, irqs_disabled() %d, preempt_count() > > %d, in_atomic() %d, rcu_read_lock_held() %d", > > inIrq, irqsDisabled, preemptBits, inAtomic, rcuHeld); > > // . . . > > } > > > > When running the selftest, I see the following output: > > [255340.388339] bpf_prog_put: in_irq() 0, irqs_disabled() 0, > > preempt_count() 256, in_atomic() 1, rcu_read_lock_held() 1 > > [255393.237632] bpf_prog_put: in_irq() 0, irqs_disabled() 0, > > preempt_count() 0, in_atomic() 0, rcu_read_lock_held() 1 > > It would be great if you also print out in_interrupt() value, so we know > whether softirq or nmi is enabled or not. > > We cannot really WARN with !rcu_read_lock_held() since the > __bpf_prog_put funciton is called in different contexts. > > Also, note that rcu_read_lock_held() may not be reliable. rcu subsystem > will return 1 if not tracked or not sure about the result. > > > > > Based on this output, I believe it is sufficient to construct a self-test case > > for bpf_prog_put_deferred() called under preempt disabled or rcu read lock > > region. However, I'm a bit confused about what I should do to build the > > self-test case. Are we looking to create a checker that verifies the > > context of bpf_prog_put_deferred() is valid? Or do we need a test case that > > can trigger this bug? > > > > Could you show me more ideas to construct a self test case? I am not familiar > > with it and have no idea. > > Okay, I see. It seems hard to create a test case with warnings since > bpf_prog_put_deferred is called in different context. So some > examples for possible issues (through code analysis) should be good enough. > > > > > -- Teng Qi > > > > On Thu, May 25, 2023 at 3:34 AM Yonghong Song <yhs@xxxxxxxx> wrote: > >> > >> > >> > >> On 5/24/23 5:42 AM, Teng Qi wrote: > >>> Thank you. > >>> > >>>> We cannot use rcu_read_lock_held() in the 'if' statement. The return > >>>> value rcu_read_lock_held() could be 1 for some configurations regardless > >>>> whether rcu_read_lock() is really held or not. In most cases, > >>>> rcu_read_lock_held() is used in issuing potential warnings. > >>>> Maybe there are other ways to record whether rcu_read_lock() is held or not? > >>> > >>> Sorry. I was not aware of the dependency of configurations of > >>> rcu_read_lock_held(). > >>> > >>>> If we cannot resolve rcu_read_lock() presence issue, maybe the condition > >>>> can be !in_interrupt(), so any process-context will go to a workqueue. > >>> > >>> I agree that using !in_interrupt() as a condition is an acceptable solution. > >> > >> This should work although it could be conservative. > >> > >>> > >>>> Alternatively, we could have another solution. We could add another > >>>> function e.g., bpf_prog_put_rcu(), which indicates that bpf_prog_put() > >>>> will be done in rcu context. > >>> > >>> Implementing a new function like bpf_prog_put_rcu() is a solution that involves > >>> more significant changes. > >> > >> Maybe we can change signature of bpf_prog_put instead? Like > >> void bpf_prog_put(struct bpf_prog *prog, bool in_rcu) > >> and inside bpf_prog_put we can add > >> WARN_ON_ONCE(in_rcu && !bpf_rcu_lock_held()); > >> > >>> > >>>> So if in_interrupt(), do kvfree, otherwise, > >>>> put into a workqueue. > >>> > >>> Shall we proceed with submitting a patch following this approach? > >> > >> You could choose either of the above although I think with newer > >> bpf_prog_put() is better. > >> > >> BTW, please do create a test case, e.g, sockmap test case which > >> can show the problem with existing code base. > >> > >>> > >>> I would like to mention something unrelated to the possible bug. At this > >>> moment, things seem to be more puzzling. vfree() is safe under in_interrupt() > >>> but not safe under other atomic contexts. > >>> This disorder challenges our conventional belief, a monotonic incrementation > >>> of limitations of the hierarchical atomic contexts, that programer needs > >>> to be more and more careful to write code under rcu read lock, spin lock, > >>> bh disable, interrupt... > >>> This disorder can lead to unexpected consequences, such as code being safe > >>> under interrupts but not safe under spin locks. > >>> The disorder makes kernel programming more complex and may result in more bugs. > >>> Even though we find a way to resolve the possible bug about the bpf_prog_put(), > >>> I feel sad for undermining of kernel`s maintainability and disorder of > >>> hierarchy of atomic contexts. > >>> > >>> -- Teng Qi > >>> > >>> On Tue, May 23, 2023 at 12:33 PM Yonghong Song <yhs@xxxxxxxx> wrote: > >>>> > >>>> > >>>> > >>>> On 5/21/23 6:39 AM, Teng Qi wrote: > >>>>> Thank you. > >>>>> > >>>>> > Your above analysis makes sense if indeed that kvfree cannot appear > >>>>> > inside a spin lock region or RCU read lock region. But is it true? > >>>>> > I checked a few code paths in kvfree/kfree. It is either guarded > >>>>> > with local_irq_save/restore or by > >>>>> > spin_lock_irqsave/spin_unlock_ > >>>>> > irqrestore, etc. Did I miss > >>>>> > anything? Are you talking about RT kernel here? > >>>>> > >>>>> To see the sleepable possibility of kvfree, it is important to analyze the > >>>>> following calling stack: > >>>>> mm/util.c: 645 kvfree() > >>>>> mm/vmalloc.c: 2763 vfree() > >>>>> > >>>>> In kvfree(), to call vfree, if the pointer addr points to memory > >>>>> allocated by > >>>>> vmalloc(), it calls vfree(). > >>>>> void kvfree(const void *addr) > >>>>> { > >>>>> if (is_vmalloc_addr(addr)) > >>>>> vfree(addr); > >>>>> else > >>>>> kfree(addr); > >>>>> } > >>>>> > >>>>> In vfree(), in_interrupt() and might_sleep() need to be considered. > >>>>> void vfree(const void *addr) > >>>>> { > >>>>> // ... > >>>>> if (unlikely(in_interrupt())) > >>>>> { > >>>>> vfree_atomic(addr); > >>>>> return; > >>>>> } > >>>>> // ... > >>>>> might_sleep(); > >>>>> // ... > >>>>> } > >>>> > >>>> Sorry. I didn't check vfree path. So it does look like that > >>>> we need to pay special attention to non interrupt part. > >>>> > >>>>> > >>>>> The vfree() may sleep if in_interrupt() == false. The RCU read lock region > >>>>> could have in_interrupt() == false and spin lock region which only disables > >>>>> preemption also has in_interrupt() == false. So the kvfree() cannot appear > >>>>> inside a spin lock region or RCU read lock region if the pointer addr points > >>>>> to memory allocated by vmalloc(). > >>>>> > >>>>> > > Therefore, we propose modifying the condition to include > >>>>> > > in_atomic(). Could we > >>>>> > > update the condition as follows: "in_irq() || irqs_disabled() || > >>>>> > > in_atomic()"? > >>>>> > Thank you! We look forward to your feedback. > >>>>> > >>>>> We now think that ‘irqs_disabled() || in_atomic() || > >>>>> rcu_read_lock_held()’ is > >>>>> more proper. irqs_disabled() is for irq flag reg, in_atomic() is for > >>>>> preempt count and rcu_read_lock_held() is for RCU read lock region. > >>>> > >>>> We cannot use rcu_read_lock_held() in the 'if' statement. The return > >>>> value rcu_read_lock_held() could be 1 for some configuraitons regardless > >>>> whether rcu_read_lock() is really held or not. In most cases, > >>>> rcu_read_lock_held() is used in issuing potential warnings. > >>>> Maybe there are other ways to record whether rcu_read_lock() is held or not? > >>>> > >>>> I agree with your that 'irqs_disabled() || in_atomic()' makes sense > >>>> since it covers process context local_irq_save() and spin_lock() cases. > >>>> > >>>> If we cannot resolve rcu_read_lock() presence issue, maybe the condition > >>>> can be !in_interrupt(), so any process-context will go to a workqueue. > >>>> > >>>> Alternatively, we could have another solution. We could add another > >>>> function e.g., bpf_prog_put_rcu(), which indicates that bpf_prog_put() > >>>> will be done in rcu context. So if in_interrupt(), do kvfree, otherwise, > >>>> put into a workqueue. > >>>> > >>>> > >>>>> > >>>>> -- Teng Qi > >>>>> > >>>>> On Sun, May 21, 2023 at 11:45 AM Yonghong Song <yhs@xxxxxxxx > >>>>> <mailto:yhs@xxxxxxxx>> wrote: > >>>>> > >>>>> > >>>>> > >>>>> On 5/19/23 7:18 AM, Teng Qi wrote: > >>>>> > Thank you for your response. > >>>>> > > Looks like you only have suspicion here. Could you find a real > >>>>> violation > >>>>> > > here where __bpf_prog_put() is called with !in_irq() && > >>>>> > > !irqs_disabled(), but inside spin_lock or rcu read lock? I > >>>>> have not seen > >>>>> > > things like that. > >>>>> > > >>>>> > For the complex conditions to call bpf_prog_put() with 1 refcnt, > >>>>> we have > >>>>> > been > >>>>> > unable to really trigger this atomic violation after trying to > >>>>> construct > >>>>> > test cases manually. But we found that it is possible to show > >>>>> cases with > >>>>> > !in_irq() && !irqs_disabled(), but inside spin_lock or rcu read lock. > >>>>> > For example, even a failed case, one of selftest cases of bpf, > >>>>> netns_cookie, > >>>>> > calls bpf_sock_map_update() and may indirectly call bpf_prog_put() > >>>>> > only inside rcu read lock: The possible call stack is: > >>>>> > net/core/sock_map.c: 615 bpf_sock_map_update() > >>>>> > net/core/sock_map.c: 468 sock_map_update_common() > >>>>> > net/core/sock_map.c: 217 sock_map_link() > >>>>> > kernel/bpf/syscall.c: 2111 bpf_prog_put() > >>>>> > > >>>>> > The files about netns_cookie include > >>>>> > tools/testing/selftests/bpf/progs/netns_cookie_prog.c and > >>>>> > tools/testing/selftests/bpf/prog_tests/netns_cookie.c. We > >>>>> inserted the > >>>>> > following code in > >>>>> > ‘net/core/sock_map.c: 468 sock_map_update_common()’: > >>>>> > static int sock_map_update_common(..) > >>>>> > { > >>>>> > int inIrq = in_irq(); > >>>>> > int irqsDisabled = irqs_disabled(); > >>>>> > int preemptBits = preempt_count(); > >>>>> > int inAtomic = in_atomic(); > >>>>> > int rcuHeld = rcu_read_lock_held(); > >>>>> > printk("in_irq() %d, irqs_disabled() %d, preempt_count() %d, > >>>>> > in_atomic() %d, rcu_read_lock_held() %d", inIrq, > >>>>> irqsDisabled, > >>>>> > preemptBits, inAtomic, rcuHeld); > >>>>> > } > >>>>> > > >>>>> > The output message is as follows: > >>>>> > root@(none):/root/bpf# ./test_progs -t netns_cookie > >>>>> > [ 137.639188] in_irq() 0, irqs_disabled() 0, preempt_count() 0, > >>>>> > in_atomic() 0, > >>>>> > rcu_read_lock_held() 1 > >>>>> > #113 netns_cookie:OK > >>>>> > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED > >>>>> > > >>>>> > We notice that there are numerous callers in kernel/, net/ and > >>>>> drivers/, > >>>>> > so we > >>>>> > highly suggest modifying __bpf_prog_put() to address this gap. > >>>>> The gap > >>>>> > exists > >>>>> > because __bpf_prog_put() is only safe under in_irq() || > >>>>> irqs_disabled() > >>>>> > but not in_atomic() || rcu_read_lock_held(). The following code > >>>>> snippet may > >>>>> > mislead developers into thinking that bpf_prog_put() is safe in all > >>>>> > contexts. > >>>>> > if (in_irq() || irqs_disabled()) { > >>>>> > INIT_WORK(&aux->work, bpf_prog_put_deferred); > >>>>> > schedule_work(&aux->work); > >>>>> > } else { > >>>>> > bpf_prog_put_deferred(&aux->work); > >>>>> > } > >>>>> > > >>>>> > Implicit dependency may lead to issues. > >>>>> > > >>>>> > > Any problem here? > >>>>> > We mentioned it to demonstrate the possibility of kvfree() being > >>>>> > called by __bpf_prog_put_noref(). > >>>>> > > >>>>> > Thanks. > >>>>> > -- Teng Qi > >>>>> > > >>>>> > On Wed, May 17, 2023 at 1:08 AM Yonghong Song <yhs@xxxxxxxx > >>>>> <mailto:yhs@xxxxxxxx> > >>>>> > <mailto:yhs@xxxxxxxx <mailto:yhs@xxxxxxxx>>> wrote: > >>>>> > > >>>>> > > >>>>> > > >>>>> > On 5/16/23 4:18 AM, starmiku1207184332@xxxxxxxxx > >>>>> <mailto:starmiku1207184332@xxxxxxxxx> > >>>>> > <mailto:starmiku1207184332@xxxxxxxxx > >>>>> <mailto:starmiku1207184332@xxxxxxxxx>> wrote: > >>>>> > > From: Teng Qi <starmiku1207184332@xxxxxxxxx > >>>>> <mailto:starmiku1207184332@xxxxxxxxx> > >>>>> > <mailto:starmiku1207184332@xxxxxxxxx > >>>>> <mailto:starmiku1207184332@xxxxxxxxx>>> > >>>>> > > > >>>>> > > Hi, bpf developers, > >>>>> > > > >>>>> > > We are developing a static tool to check the matching between > >>>>> > helpers and the > >>>>> > > context of hooks. During our analysis, we have discovered some > >>>>> > important > >>>>> > > findings that we would like to report. > >>>>> > > > >>>>> > > ‘kernel/bpf/syscall.c: 2097 __bpf_prog_put()’ shows that > >>>>> function > >>>>> > > bpf_prog_put_deferred() won`t be called in the condition of > >>>>> > > ‘in_irq() || irqs_disabled()’. > >>>>> > > if (in_irq() || irqs_disabled()) { > >>>>> > > INIT_WORK(&aux->work, bpf_prog_put_deferred); > >>>>> > > schedule_work(&aux->work); > >>>>> > > } else { > >>>>> > > > >>>>> > > bpf_prog_put_deferred(&aux->work); > >>>>> > > } > >>>>> > > > >>>>> > > We suspect this condition exists because there might be > >>>>> sleepable > >>>>> > operations > >>>>> > > in the callees of the bpf_prog_put_deferred() function: > >>>>> > > kernel/bpf/syscall.c: 2097 __bpf_prog_put() > >>>>> > > kernel/bpf/syscall.c: 2084 bpf_prog_put_deferred() > >>>>> > > kernel/bpf/syscall.c: 2063 __bpf_prog_put_noref() > >>>>> > > kvfree(prog->aux->jited_linfo); > >>>>> > > kvfree(prog->aux->linfo); > >>>>> > > >>>>> > Looks like you only have suspicion here. Could you find a real > >>>>> > violation > >>>>> > here where __bpf_prog_put() is called with !in_irq() && > >>>>> > !irqs_disabled(), but inside spin_lock or rcu read lock? I > >>>>> have not seen > >>>>> > things like that. > >>>>> > > >>>>> > > > >>>>> > > Additionally, we found that array prog->aux->jited_linfo is > >>>>> > initialized in > >>>>> > > ‘kernel/bpf/core.c: 157 bpf_prog_alloc_jited_linfo()’: > >>>>> > > prog->aux->jited_linfo = kvcalloc(prog->aux->nr_linfo, > >>>>> > > sizeof(*prog->aux->jited_linfo), > >>>>> bpf_memcg_flags(GFP_KERNEL | > >>>>> > __GFP_NOWARN)); > >>>>> > > >>>>> > Any problem here? > >>>>> > > >>>>> > > > >>>>> > > Our question is whether the condition 'in_irq() || > >>>>> > irqs_disabled() == false' is > >>>>> > > sufficient for calling 'kvfree'. We are aware that calling > >>>>> > 'kvfree' within the > >>>>> > > context of a spin lock or an RCU lock is unsafe. > >>>>> > >>>>> Your above analysis makes sense if indeed that kvfree cannot appear > >>>>> inside a spin lock region or RCU read lock region. But is it true? > >>>>> I checked a few code paths in kvfree/kfree. It is either guarded > >>>>> with local_irq_save/restore or by > >>>>> spin_lock_irqsave/spin_unlock_irqrestore, etc. Did I miss > >>>>> anything? Are you talking about RT kernel here? > >>>>> > >>>>> > >>>>> > > > >>>>> > > Therefore, we propose modifying the condition to include > >>>>> > in_atomic(). Could we > >>>>> > > update the condition as follows: "in_irq() || > >>>>> irqs_disabled() || > >>>>> > in_atomic()"? > >>>>> > > > >>>>> > > Thank you! We look forward to your feedback. > >>>>> > > > >>>>> > > Signed-off-by: Teng Qi <starmiku1207184332@xxxxxxxxx > >>>>> <mailto:starmiku1207184332@xxxxxxxxx> > >>>>> > <mailto:starmiku1207184332@xxxxxxxxx > >>>>> <mailto:starmiku1207184332@xxxxxxxxx>>> > >>>>> > > >>>>>