On Thu, Nov 17, 2022 at 9:29 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > On Thu, Nov 17, 2022 at 1:16 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > On Nov 17, 2022, at 2:29 PM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > On Thu, Nov 17, 2022 at 05:40:40PM +0000, Joel Fernandes wrote: > > >>> On Thu, Nov 17, 2022 at 5:38 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > >>> > > >>> On Thu, Nov 17, 2022 at 5:17 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote: > > >>>> > > >>>> On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > >>>>> > > >>>>> Hello Eric, > > >>>>> > > >>>>> On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote: > > >>>>>> On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google) > > >>>>>> <joel@xxxxxxxxxxxxxxxxx> wrote: > > >>>>>>> > > >>>>>>> In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY > > >>>>>>> causes a networking test to fail in the teardown phase. > > >>>>>>> > > >>>>>>> The failure happens during: ip netns del <name> > > >>>>>> > > >>>>>> And ? What happens then next ? > > >>>>> > > >>>>> The test is doing the 'ip netns del <name>' and then polling for the > > >>>>> disappearance of a network interface name for upto 5 seconds. I believe it is > > >>>>> using netlink to get a table of interfaces. That polling is timing out. > > >>>>> > > >>>>> Here is some more details from the test's owner (copy pasting from another > > >>>>> bug report): > > >>>>> In the cleanup, we remove the netns, and thus will cause the veth pair being > > >>>>> removed automatically, so we use a poll to check that if the veth in the root > > >>>>> netns still exists to know whether the cleanup is done. > > >>>>> > > >>>>> Here is a public link to the code that is failing (its in golang): > > >>>>> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161 > > >>>>> > > >>>>> Here is a public link to the line of code in the actual test leading up to the above > > >>>>> path (this is the test that is run: > > >>>>> network.RoutingFallthrough.ipv4_only_primary) : > > >>>>> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52 > > >>>>> > > >>>>>>> Using ftrace, I found the callbacks it was queuing which this series fixes. Use > > >>>>>>> call_rcu_flush() to revert to the old behavior. With that, the test passes. > > >>>>>> > > >>>>>> What is this test about ? What barrier was used to make it not flaky ? > > >>>>> > > >>>>> I provided the links above, let me know if you have any questions. > > >>>>> > > >>>>>> Was it depending on some undocumented RCU behavior ? > > >>>>> > > >>>>> This is a new RCU feature posted here for significant power-savings on > > >>>>> battery-powered devices: > > >>>>> https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a > > >>>>> > > >>>>> There is also an LPC presentation about the same, I can dig the link if you > > >>>>> are interested. > > >>>>> > > >>>>>> Maybe adding a sysctl to force the flush would be better for functional tests ? > > >>>>>> > > >>>>>> I would rather change the test(s), than adding call_rcu_flush(), > > >>>>>> adding merge conflicts to future backports. > > >>>>> > > >>>>> I am not too sure about that, I think a user might expect the network > > >>>>> interface to disappear from the networking tables quickly enough without > > >>>>> dealing with barriers or kernel iternals. However, I added the authors of the > > >>>>> test to this email in the hopes he can provide is point of views as well. > > >>>>> > > >>>>> The general approach we are taking with this sort of thing is to use > > >>>>> call_rcu_flush() which is basically the same as call_rcu() for systems with > > >>>>> CALL_RCU_LAZY=n. You can see some examples of that in the patch series link > > >>>>> above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only > > >>>>> Android and ChromeOS that are using it. I am adding Jie to share any input, > > >>>>> he is from the networking team and knows this test well. > > >>>>> > > >>>>> > > >>>> > > >>>> I do not know what is this RCU_LAZY thing, but IMO this should be opt-in > > >>> > > >>> You should read the links I sent you. We did already try opt-in, > > >>> Thomas Gleixner made a point at LPC that we should not add new APIs > > >>> for this purpose and confuse kernel developers. > > >>> > > >>>> For instance, only kfree_rcu() should use it. > > >>> > > >>> No. Most of the call_rcu() usages are for freeing memory, so the > > >>> consensus is we should apply this as opt out and fix issues along the > > >>> way. We already did a lot of research/diligence on seeing which users > > >>> need conversion. > > >>> > > >>>> We can not review hundreds of call_rcu() call sites and decide if > > >>>> adding arbitrary delays cou hurt . > > >>> > > >>> That work has already been done as much as possible, please read the > > >>> links I sent. > > >> > > >> Also just to add, this test is a bit weird / corner case, as in anyone > > >> expecting a quick response from call_rcu() is broken by design. > > >> However, for these callbacks, it does not matter much which API they > > >> use as they are quite infrequent for power savings. > > > > > > The "broken by design" is a bit strong. Some of those call_rcu() > > > invocations have been around for the better part of 20 years, after all. > > > > > > That aside, I do hope that we can arrive at something that will enhance > > > battery lifetime while avoiding unnecessary disruption. But we are > > > unlikely to be able to completely avoid disruption. As this email > > > thread illustrates. ;-) > > > > Another approach, with these 3 patches could be to keep the call_rcu() but add an rcu_barrier() after them. I think people running ip del netns should not have to wait for their RCU cb to take too long to run and remove user visible state. But I would need suggestions from networking experts which CBs of these 3, to do this for. Or for all of them. > > > > Alternatively, we can also patch just the test with a new knob that does rcu_barrier. But I dislike that as it does not fix it for all users. Probably the ip utilities will also need a patch then. > > > > Normally we have an rcu_barrier() in netns dismantle path already at a > strategic location ( in cleanup_net() ) > > Maybe the issue here is that some particular layers need another one. > Or we need to release a blocking reference before the call_rcu(). > Some call_rcu() usages might not be optimal in this respect. > > We should not add an rcu_barrier() after a call_rcu(), we prefer > factoring these expensive operations. Sounds good! The dst_destroy_rcu() function appears complex to break down (similar to how you suggested in 2/3). The dst->ops->destroy() can decrement refcounts and so forth. We could audit all such dst->ops users, but I wonder if it is safer to use call_rcu_flush() for this patch. Thanks, - Joel