On Tue, Nov 5, 2024 at 10:56 AM Saeed Mahameed <saeed@xxxxxxxxxx> wrote: > > On 31 Oct 10:34, Caleb Sander Mateos wrote: > >Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet > >to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs > >whose completions don't need to be processed in tasklet context, this > >adds unnecessary overhead. In a heavy TCP workload, we see 4% of CPU > >time spent on the tasklet_trylock() in tasklet_action_common(), with a > >smaller amount spent on the atomic operations in tasklet_schedule(), > >tasklet_clear_sched(), and locking the spinlock in mlx5_cq_tasklet_cb(). > >TCP completions are handled by mlx5e_completion_event(), which schedules > >NAPI to poll the queue, so they don't need tasklet processing. > > > >Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this > >overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs > >to be processed in tasklet context, so it can schedule the tasklet. CQs > >that need tasklet processing have their interrupt comp handler set to > >mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that > >don't need tasklet processing won't schedule the tasklet. To avoid > >scheduling the tasklet multiple times during the same interrupt, only > >schedule the tasklet in mlx5_add_cq_to_tasklet() if the tasklet work > >queue was empty before the new CQ was pushed to it. > > > >The additional branch in mlx5_add_cq_to_tasklet(), called for each EQE, > >may add a small cost for the userspace Infiniband CQs whose completions > >are processed in tasklet context. But this seems worth it to avoid the > >tasklet overhead for CQs that don't need it. > > > >Note that the mlx4 driver works the same way: it schedules the tasklet > >in mlx4_add_cq_to_tasklet() and only if the work queue was empty before. > > > >Signed-off-by: Caleb Sander Mateos <csander@xxxxxxxxxxxxxxx> > >Reviewed-by: Parav Pandit <parav@xxxxxxxxxx> > >--- > >v3: revise commit message > >v2: reorder variable declarations, describe CPU profile results > > > > drivers/net/ethernet/mellanox/mlx5/core/cq.c | 5 +++++ > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 5 +---- > > 2 files changed, 6 insertions(+), 4 deletions(-) > > > >diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c b/drivers/net/ethernet/mellanox/mlx5/core/cq.c > >index 4caa1b6f40ba..25f3b26db729 100644 > >--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c > >+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c > >@@ -69,22 +69,27 @@ void mlx5_cq_tasklet_cb(struct tasklet_struct *t) > > static void mlx5_add_cq_to_tasklet(struct mlx5_core_cq *cq, > > struct mlx5_eqe *eqe) > > { > > unsigned long flags; > > struct mlx5_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv; > >+ bool schedule_tasklet = false; > > > > spin_lock_irqsave(&tasklet_ctx->lock, flags); > > /* When migrating CQs between EQs will be implemented, please note > > * that you need to sync this point. It is possible that > > * while migrating a CQ, completions on the old EQs could > > * still arrive. > > */ > > if (list_empty_careful(&cq->tasklet_ctx.list)) { > > mlx5_cq_hold(cq); > > The condition here is counter intuitive, please add a comment that relates > to the tasklet routine mlx5_cq_tasklet_cb, something like. > /* If this list isn't empty, the tasklet is already scheduled, and not yet > * executing the list, the spinlock here guarantees the addition of this CQ > * will be seen by the next execution, so rescheduling the tasklet is not > * required */ Sure, will send out a v4. > > One other way to do this, is to flag tasklet_ctx.sched_flag = true, inside > mlx5_add_cq_to_tasklet, and then schedule once at the end of eq irq processing > if (tasklet_ctx.sched_flag == true). to avoid "too" early scheduling, but > since the tasklet can't run until the irq handler returns, I think your > solution shouldn't suffer from "too" early scheduling .. Right, that was my thinking behind the list_empty(&tasklet_ctx->list) check.