On Fri, Aug 07, 2020 at 04:47:56PM -0400, Joel Fernandes wrote: > Hi, > Adding more of us working on RCU as well. Johan from another team at > Google discovered a likely issue in openswitch, details below: > > On Fri, Aug 7, 2020 at 11:32 AM Johan Knöös <jknoos@xxxxxxxxxx> wrote: > > > > On Tue, Aug 4, 2020 at 8:52 AM Gregory Rose <gvrose8192@xxxxxxxxx> wrote: > > > > > > > > > > > > On 8/3/2020 12:01 PM, Johan Knöös via discuss wrote: > > > > Hi Open vSwitch contributors, > > > > > > > > We have found openvswitch is causing double-freeing of memory. The > > > > issue was not present in kernel version 5.5.17 but is present in > > > > 5.6.14 and newer kernels. > > > > > > > > After reverting the RCU commits below for debugging, enabling > > > > slub_debug, lockdep, and KASAN, we see the warnings at the end of this > > > > email in the kernel log (the last one shows the double-free). When I > > > > revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch: > > > > fix possible memleak on destroy flow-table"), the symptoms disappear. > > > > While I have a reliable way to reproduce the issue, I unfortunately > > > > don't yet have a process that's amenable to sharing. Please take a > > > > look. > > > > > > > > 189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch() > > > > 77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback handling > > > > e99637becb2e rcu: Add support for debug_objects debugging for kfree_rcu() > > > > 0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work > > > > 569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo > > > > a35d16905efc rcu: Add basic support for kfree_rcu() batching > > Note that these reverts were only for testing the same code, because > he was testing 2 different kernel versions. One of them did not have > this set. So I asked him to revert. There's no known bug in the > reverted code itself. But somehow these patches do make it harder for > him to reproduce the issue. Perhaps they adjust timing? > > > > Thanks, > > > > Johan Knöös > > > > > > Let's add the author of the patch you reverted and the Linux netdev > > > mailing list. > > > > > > - Greg > > > > I found we also sometimes get warnings from > > https://elixir.bootlin.com/linux/v5.5.17/source/kernel/rcu/tree.c#L2239 > > under similar conditions even on kernel 5.5.17, which I believe may be > > related. However, it's much rarer and I don't have a reliable way of > > reproducing it. Perhaps 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 only > > increases the frequency of a pre-existing bug. > > This is interesting, because I saw kbuild warn me recently [1] about > it as well. Though, I was actually intentionally messing with the > segcblist. I plan to debug it next week, but the warning itself is > unlikely to be caused by my patch IMHO (since it is slightly > orthogonal to what I changed). > > [1] https://lore.kernel.org/lkml/20200720005334.GC19262@shao2-debian/ > > But then again, I have not heard reports of this warning firing. Paul, > has this come to your radar recently? I have not seen any recent WARNs in rcu_do_batch(). I am guessing that this is one of the last two in that function? If so, have you tried using CONFIG_DEBUG_OBJECTS_RCU_HEAD=y? That Kconfig option is designed to help locate double frees via RCU. Thanx, Paul