We found that ip6t_do_table was reading stale values from the
READ_ONCE
leading to use after free when dereferencing per CPU jumpstack values.
[https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/ipv6/netfilter/ip6_tables.c?h=v5.4.77#n282]
addend = xt_write_recseq_begin();
private = READ_ONCE(table->private); /* Address dependency. */
We were able to confirm that the xt_replace_table & __do_replace
had already executed by logging the seqcount values. The value of
seqcount at ip6t_do_table was 1 more than the value at
xt_replace_table.
Can you elaborate? Was that before xt_write_recseq_begin() or after?
The log in ip6t_do_table was after xt_write_recseq_begin().
[https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/ipv6/netfilter/ip6_tables.c?h=v5.4.77#n282]
addend = xt_write_recseq_begin();
The log in xt_replace_table was after raw_read_seqcount of
per_cpu(xt_recseq)
[https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/netfilter/x_tables.c?h=v5.4.77#n1400]
seqcount_t *s = &per_cpu(xt_recseq, cpu);
u32 seq = raw_read_seqcount(s);
In this specific run, we observed that the value in ip6t_do_table after
xt_write_recseq_begin() was 75316887. The value in xt_replace_table was
after
raw_read_seqcount of per_cpu(xt_recseq) was 75316886. This confirms that
the
private pointer assignment in xt_replace_table happened before the start
of
ip6t_do_table.
You mean
"private = READ_ONCE(table->private); /* Address dependency. */" in
ip6_do_table did pick up the 'old' private pointer, AFTER xt_replace
had updated it?
Yes, that is what we have observed from the logging.
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 525f674..417ea1b 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1431,11 +1431,11 @@ xt_replace_table(struct xt_table *table,
* Ensure contents of newinfo are visible before assigning to
* private.
*/
- smp_wmb();
- table->private = newinfo;
+ smp_mb();
+ WRITE_ONCE(table->private, newinfo);
The WRITE_ONCE looks correct.
/* make sure all cpus see new ->private value */
- smp_wmb();
+ smp_mb();
Is that to order wrt. seqcount_sequence?
Correct, we want to ensure that the table->private is updated and is
in sync on all CPUs beyond that point.
Do you have a way to reproduce such crashes?
I tried to no avail but I guess thats just because amd64 is more
forgiving.
Thanks!
Unfortunately we are seeing it on ARM64 regression systems which runs a
variety of
usecases so the exact steps are not known.