Re: data-race in nf_tables_newtable / nf_tables_newtable

Gabriel Ryan <gabe@xxxxxxxxxxxxxxx> · Mon, 22 Aug 2022 16:29:01 -0400

Hi Florian,

I just looked at the lock event trace from our report and it looks
like two distinct commit mutexes were held when the race was
triggered. I think the race is probably on the table_handle variable
on net/netfilter/nf_tables_api.c:1221, and not the table->handle field
being written to.

Racing increments to table_handle could cause it to either overcount
or undercount. Could that be an issue?

Best,

Gabe

On Fri, Aug 19, 2022 at 8:35 AM Florian Westphal <fw@xxxxxxxxx> wrote:
>
> Abhishek Shah <abhishek.shah@xxxxxxxxxxxx> wrote:
> > Hi all,
> >
> > We found a race involving the table->handle variable here
> > <https://urldefense.proofpoint.com/v2/url?u=https-3A__elixir.bootlin.com_linux_v5.18-2Drc5_source_net_netfilter_nf-5Ftables-5Fapi.c-23L1221&d=DwIBAg&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=EyAJYRJu01oaAhhVVY3o8zKgZvacDAXd_PNRtaqACCo&m=xlZC-wDg7fkTm6_4HfcoDqYfJx_OU2L5HHX2q_yTYZZCEDCFAg-9I7T1gNmXPISg&s=JYkSOriQVx_3lJhAzBo7yqhe4bnf2Sy96cPL0L1NIn8&e=  >.
> > This race advances the pointer, which can cause out-of-bounds memory
> > accesses in the future. Please let us know what you think.
> >
> > Thanks!
> >
> >
> > *---------------------Report-----------------*
> > *read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0:
> >  nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
> >  nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
>
> [..]
>
> > *read-write* to 0xffffffff883a01e8 of 8 bytes by task 6541 on cpu 1:
> >  nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
> >  nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
>
> [..]
>
> I don't understand.  Like all batch operations, nf_tables_newtable is
> supposed to run with the transaction mutex held, i.e. parallel execution
> is not expected.
>
> There is a lockdep assertion at start of nf_tables_newtable(); I
> don't see how its possible that two threads can run this concurrently.

-- 
Gabriel Ryan
PhD Candidate at Columbia University