subashab@xxxxxxxxxxxxxx <subashab@xxxxxxxxxxxxxx> wrote: > > I'm fine with this change but AFAIU this is just a cleanup since > > this part isn't a read-sequence as no 'shared state' is accessed/read > > between > > the seqcount begin and the do{}while. smb_rmb placement should not > > matter here. > > > > Did I miss anything? > > > > Thanks. > > Hi Florian > > To provide more background on this, we are seeing occasional crashes in a > regression rack in the packet receive path where there appear to be > some rules being modified concurrently. > > Unable to handle kernel NULL pointer dereference at virtual > address 000000000000008e > pc : ip6t_do_table+0x5d0/0x89c > lr : ip6t_do_table+0x5b8/0x89c > ip6t_do_table+0x5d0/0x89c > ip6table_filter_hook+0x24/0x30 > nf_hook_slow+0x84/0x120 > ip6_input+0x74/0xe0 > ip6_rcv_finish+0x7c/0x128 > ipv6_rcv+0xac/0xe4 > __netif_receive_skb+0x84/0x17c > process_backlog+0x15c/0x1b8 > napi_poll+0x88/0x284 > net_rx_action+0xbc/0x23c > __do_softirq+0x20c/0x48c > > We found that ip6t_do_table was reading stale values from the READ_ONCE > leading to use after free when dereferencing per CPU jumpstack values. > > [https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/ipv6/netfilter/ip6_tables.c?h=v5.4.77#n282] > addend = xt_write_recseq_begin(); > private = READ_ONCE(table->private); /* Address dependency. */ > > We were able to confirm that the xt_replace_table & __do_replace > had already executed by logging the seqcount values. The value of > seqcount at ip6t_do_table was 1 more than the value at xt_replace_table. Can you elaborate? Was that before xt_write_recseq_begin() or after? > The seqcount read at xt_replace_table also showed that it was an even value > and hence meant that there was no conccurent writer instance > (xt_write_recseq_begin) > at that point. Ok, so either ip6_do_table was done or just starting (and it should not matter which one). > [https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/netfilter/x_tables.c?h=v5.4.77#n1401] > u32 seq = raw_read_seqcount(s); > > This means that table assignment at xt_replace_table did not take effect > as expected. You mean "private = READ_ONCE(table->private); /* Address dependency. */" in ip6_do_table did pick up the 'old' private pointer, AFTER xt_replace had updated it? > [https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/netfilter/x_tables.c?h=v5.4.77#n1386] > smp_wmb(); > table->private = newinfo; > > /* make sure all cpus see new ->private value */ > smp_wmb(); > > We want to know if this barrier usage is as expected here. > Alternatively, would changing this help here - > > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > index 525f674..417ea1b 100644 > --- a/net/netfilter/x_tables.c > +++ b/net/netfilter/x_tables.c > @@ -1431,11 +1431,11 @@ xt_replace_table(struct xt_table *table, > * Ensure contents of newinfo are visible before assigning to > * private. > */ > - smp_wmb(); > - table->private = newinfo; > + smp_mb(); > + WRITE_ONCE(table->private, newinfo); The WRITE_ONCE looks correct. > /* make sure all cpus see new ->private value */ > - smp_wmb(); > + smp_mb(); Is that to order wrt. seqcount_sequence? Do you have a way to reproduce such crashes? I tried to no avail but I guess thats just because amd64 is more forgiving. Thanks!