> On Nov 17, 2022, at 4:44 PM, Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google) > <joel@xxxxxxxxxxxxxxxxx> wrote: >> >> In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY >> causes a networking test to fail in the teardown phase. >> >> The failure happens during: ip netns del <name> >> >> Using ftrace, I found the callbacks it was queuing which this series fixes. Use >> call_rcu_flush() to revert to the old behavior. With that, the test passes. >> >> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> >> --- >> net/sched/sch_generic.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c >> index a9aadc4e6858..63fbf640d3b2 100644 >> --- a/net/sched/sch_generic.c >> +++ b/net/sched/sch_generic.c >> @@ -1067,7 +1067,7 @@ static void qdisc_destroy(struct Qdisc *qdisc) >> >> trace_qdisc_destroy(qdisc); >> >> - call_rcu(&qdisc->rcu, qdisc_free_cb); >> + call_rcu_flush(&qdisc->rcu, qdisc_free_cb); >> } > > I took a look at this one. > > qdisc_free_cb() is essentially freeing : Some per-cpu memory, and the > 'struct Qdisc' > > I do not see why we need to force a flush for this (small ?) piece of memory. I’ll try to drop that and rerun the test, and get back to you. It could be that there is a different callback that this flush() is compensating for, or something. I am pretty sure at one point, dropping this patch made the test fail most of the time. Now it passes 100%. I’ll also attempt to collect a complete trace, maybe I’ll learn some networking code in the process.. Thanks!