Re: [PATCH nf] x_tables: Properly close read section with read_seqcount_retry

subashab@xxxxxxxxxxxxxx · Mon, 16 Nov 2020 09:26:09 -0700

We found that ip6t_do_table was reading stale values from the 
READ_ONCE
leading to use after free when dereferencing per CPU jumpstack values.

[https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/ipv6/netfilter/ip6_tables.c?h=v5.4.77#n282]
	addend = xt_write_recseq_begin();
	private = READ_ONCE(table->private); /* Address dependency. */

We were able to confirm that the xt_replace_table & __do_replace
had already executed by logging the seqcount values. The value of
seqcount at ip6t_do_table was 1 more than the value at 
xt_replace_table.

Can you elaborate? Was that before xt_write_recseq_begin() or after?

The log in ip6t_do_table was after xt_write_recseq_begin().

[https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/ipv6/netfilter/ip6_tables.c?h=v5.4.77#n282]
	addend = xt_write_recseq_begin();

The log in xt_replace_table was after raw_read_seqcount of 
per_cpu(xt_recseq)

[https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/netfilter/x_tables.c?h=v5.4.77#n1400]
		seqcount_t *s = &per_cpu(xt_recseq, cpu);
		u32 seq = raw_read_seqcount(s);

In this specific run, we observed that the value in ip6t_do_table after
xt_write_recseq_begin() was 75316887. The value in xt_replace_table was 
after
raw_read_seqcount of per_cpu(xt_recseq) was 75316886. This confirms that 
the
private pointer assignment in xt_replace_table happened before the start 
of
ip6t_do_table.

You mean
"private = READ_ONCE(table->private); /* Address dependency. */" in
ip6_do_table did pick up the 'old' private pointer, AFTER xt_replace
had updated it?


Yes, that is what we have observed from the logging.

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 525f674..417ea1b 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1431,11 +1431,11 @@ xt_replace_table(struct xt_table *table,
         * Ensure contents of newinfo are visible before assigning to
         * private.
         */
-       smp_wmb();
-       table->private = newinfo;
+       smp_mb();
+       WRITE_ONCE(table->private, newinfo);

The WRITE_ONCE looks correct.

        /* make sure all cpus see new ->private value */
-       smp_wmb();
+       smp_mb();

Is that to order wrt. seqcount_sequence?

Correct, we want to ensure that the table->private is updated and is
in sync on all CPUs beyond that point.

Do you have a way to reproduce such crashes?

I tried to no avail but I guess thats just because amd64 is more
forgiving.

Thanks!

Unfortunately we are seeing it on ARM64 regression systems which runs a 
variety of
usecases so the exact steps are not known.