> Hi, > > On Wed, Nov 03, 2021 at 11:31:36AM +0200, Volodymyr Mytnyk wrote: > > From: Volodymyr Mytnyk <vmytnyk@xxxxxxxxxxx> > > > > On busy system with big number (few thousands) of HW offloaded flows, it > > is possible to hit the situation, where some of the conntack flows are > > stuck in conntrack table (as offloaded) and cannot be removed by user. > > > > This behaviour happens if user has configured conntack using tc sub-system, > > offloaded those flows for HW and then deleted tc configuration from Linux > > system by deleting the tc qdiscs. > > > > When qdiscs are removed, the nf_flow_table_free() is called to do the > > cleanup of HW offloaded flows in conntrack table. > > > > ... > > process_one_work > > tcf_ct_flow_table_cleanup_work() > > nf_flow_table_free() > > > > The nf_flow_table_free() does the following things: > > > > 1. cancels gc workqueue > > 2. marks all flows as teardown > > 3. executes nf_flow_offload_gc_step() once for each flow to > > trigger correct teardown flow procedure (e.g., allocate > > work to delete the HW flow and marks the flow as "dying"). > > 4. waits for all scheduled flow offload works to be finished. > > 5. executes nf_flow_offload_gc_step() once for each flow to > > trigger the deleting of flows. > > > > Root cause: > > > > In step 3, nf_flow_offload_gc_step() expects to move flow to "dying" > > state by using nf_flow_offload_del() and deletes the flow in next > > nf_flow_offload_gc_step() iteration. But, if flow is in "pending" state > > for some reason (e.g., reading HW stats), it will not be moved to > > "dying" state as expected by nf_flow_offload_gc_step() and will not > > be marked as "dead" for delition. > > > > In step 5, nf_flow_offload_gc_step() assumes that all flows marked > > as "dead" and will be deleted by this call, but this is not true since > > the state was not set diring previous nf_flow_offload_gc_step() > > call. > > > > It issue causes some of the flows to get stuck in connection tracking > > system or not release properly. > > > > To fix this problem, add nf_flow_table_offload_flush() call between 2 & 3 > > step, to make sure no other flow offload works will be in "pending" state > > during step 3. > > Thanks for the detailed report. > > I'm attaching two patches, the first one is a preparation patch. The > second patch flushes the pending work, then it sets the teardown flag > to all flows in the flowtable and it forces a garbage collector run to > queue work to remove the flows from hardware, then it flushes this new > pending work and (finally) it forces another garbage collector run to > remove the entry from the software flowtable. Compile-tested only. Hi Pablo, Thanks for reviewing the changes and problem investigation. I will check the provided patches and will back to you. Regards, Volodymyr