Re: [nf-next] netfilter: nf_conntrack, add IPS_HW_OFFLOAD status bit

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> · Mon, 20 Apr 2020 23:05:54 +0200

On Mon, Apr 20, 2020 at 11:45:44AM -0500, Bodong Wang wrote:
> On 4/20/2020 10:58 AM, Pablo Neira Ayuso wrote:
> > On Mon, Apr 20, 2020 at 10:46:54AM -0500, Bodong Wang wrote:
> > > On 4/20/2020 10:33 AM, Pablo Neira Ayuso wrote:
> > > > On Mon, Apr 20, 2020 at 10:28:00AM -0500, Bodong Wang wrote:
> > > > > On 4/20/2020 10:15 AM, Pablo Neira Ayuso wrote:
> > > > > > On Mon, Apr 20, 2020 at 09:58:10AM -0500, Bodong Wang wrote:
> > > > > > [...]
> > > > > > > @@ -796,6 +799,16 @@ static void flow_offload_work_stats(struct flow_offload_work *offload)
> > > > > > >     				       FLOW_OFFLOAD_DIR_REPLY,
> > > > > > >     				       stats[1].pkts, stats[1].bytes);
> > > > > > >     	}
> > > > > > > +
> > > > > > > +	/* Clear HW_OFFLOAD immediately when lastused stopped updating, this can
> > > > > > > +	 * happen in two scenarios:
> > > > > > > +	 *
> > > > > > > +	 * 1. TC rule on a higher level device (e.g. vxlan) was offloaded, but
> > > > > > > +	 *    HW driver is unloaded.
> > > > > > > +	 * 2. One of the shared block driver is unloaded.
> > > > > > > +	 */
> > > > > > > +	if (!lastused)
> > > > > > > +		clear_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status);
> > > > > > >     }
> > > > > > Better inconditionally clear off the flag after the entry is removed
> > > > > > from hardware instead of relying on the lastused field?
> > > > > Functionality wise, it should work. Current way is more for containing the
> > > > > set/clear in the same domain, and no need to ask each vendor to take care of
> > > > > this bit.
> > > > No need to ask each vendor, what I mean is to deal with this from
> > > > flow_offload_work_del(), see attached patch.
> > > Oh, I see. That is already covered in my patch as below. Howerver,
> > > flow_offload_work_del will only be triggered after timeout expired(30sec).
> > > User will see incorrect CT state within this 30 seconds timeframe, which the
> > > clear_bit based on lastused can solve it.
> > For TCP fin/rst the removal from hardware occurs once once the
> > workqueue has a chance to run.
> > 
> > For UDP, or in case the TCP connection stalls or no packets are seeing
> > after 30 seconds, then the flow is removed from hardware after 30
> > seconds.
> > 
> > The IPS_HW_OFFLOAD_BIT flag should be cleaned up when the flow is
> > effectively removed from hardware.
> > 
> > Why do you want to clean it up earlier than that? With your approach,
> > the flag is cleared but the flow is still in hardware?
> 
> In normally cases(no driver unload, etc), it is imdediately removed by
> flow_offload_work_del. Requests to remove the HW flow are from netfilter
> layer to driver via block_cb->cb. We're well covered in such cases.
> 
> The lastused is more for conner cases such as: iperf is still running, but
> driver is unloaded. In such case, driver removed all HW flows without
> notifying netfilter.

The driver should invoke the flowtable garbage collector let it clean
up the entries before the driver is unloaded.

> Flow_offload_work_del will only be called once timeout expired, and
> the indication of HW_OFFLOAD is incorrect within the timeout period.
> Meanwhile, lastused stopped updating once driver unloading process
> destroied flow couters. So, relying on this field to clear the
> HW_OFFLOAD bit to cover such conner cases.

This corner case is interesting.

Could you submit this patch without the lastused trick? Then, we can
revisit how the driver invokes the garbage collector to deal with the
cleanup?

All this model is based on the garbage collector being the one that is
responsible to cleaning up the flowtable. If there are multiple entry
points to add/remove entries to the flowtable, we'll end up with
complicated locking sooner or later.

Thanks.