Re: [nf PATCH 2/5] netfilter: nf_tables: Add locking for NFT_MSG_GETRULE_RESET requests

Phil Sutter <phil@xxxxxx> · Tue, 26 Sep 2023 14:14:05 +0200

On Tue, Sep 26, 2023 at 12:09:35PM +0200, Pablo Neira Ayuso wrote:
> Hi Phil,
> 
> On Tue, Sep 26, 2023 at 11:34:43AM +0200, Phil Sutter wrote:
> > On Mon, Sep 25, 2023 at 09:53:17PM +0200, Florian Westphal wrote:
> > > Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> > > > On Sat, Sep 23, 2023 at 06:18:13PM +0200, Florian Westphal wrote:
> > > > > callback_that_might_reset()
> > > > > {
> > > > > 	try_module_get ...
> > > > > 	rcu_read_unlock()
> > > > > 	mutex_lock(net->commit_mutex)
> > > > > 	  dumper();
> > > > > 	mutex_unlock(net->commit_mutex)
> > > > > 	rcu_read_lock();
> > > > > 	module_put()
> > > > > }
> > > > >
> > > > > should do the trick.
> > > > 
> > > > Idiom above LGTM, *except for net->commit_mutex*. Please do not use
> > > > ->commit_mutex: This will stall ruleset updates for no reason, netlink
> > > > dump would grab and release such mutex for each netlink_recvmsg() call
> > > > and netlink dump side will always retry because of NLM_F_EINTR.
> > > 
> > > It will stall updates, but for good reason: we are making changes to the
> > > expressions state.
> > 
> > This also disqualifies the use of Pablo's suggested reset_lock, right?
> 
> Quick summary:
> 
> We are currently discussing if it makes sense to add a new lock or
> not. The commit_mutex stalls updates, but netlink dumps retrieves
> listings in chunks, that is, one recvmsg() call from userspace (to
> retrieve one list chunk) will grab the mutex then release it until the
> next recvmsg() call is done. Between these two calls an update is
> still possible. The question is if it is worth to stall an ongoing
> listing or updates.

Thanks for the summary. Assuming that a blocked commit will only be
postponed until after the current chunk was filled and is being
submitted to user space, I don't see how it would make a practical
difference for reset command if commit_mutex is used instead of
reset_lock (or a dedicated reset_mutex).

> There is the NLM_F_EINTR mechanism in place that tells that an
> interference has occured while keeping the listing lockless.
> 
> Unless I am missing anything, the goal is to fix two different
> processes that are listing at the same time, that is, two processes
> running a netlink dump at the same time that are resetting the
> stateful expressions in the ruleset.

Here's a simple repro I use to verify the locking approach (only rule
reset for now):

| set -e
| 
| RULESET='flush ruleset
| table t {
|       chain c {
|               counter packets 23 bytes 42
|       }
| }'
| 
| trap "$NFT list ruleset" EXIT
| for ((i = 0; i < 10000; i++)); do
|       echo "iter $i"
|       $NFT -f - <<< "$RULESET"
|       $NFT list ruleset | grep -q 'packets 23 bytes 42' >/dev/null
|       $NFT reset rules >/dev/null &
|       pid=$!
|       $NFT reset rules >/dev/null
|       wait $!
|       #$NFT list ruleset | grep 'packets'
|       $NFT list ruleset | grep -q 'packets 0 bytes 0' >/dev/null
| done

If the two calls clash, the rule will have huge counter values due to
underflow.