On 10/31/24 17:34, Caleb Sander Mateos wrote: > Currently, the mlx5_eq_comp_int() interrupt handler schedules a tasklet > to call mlx5_cq_tasklet_cb() if it processes any completions. For CQs > whose completions don't need to be processed in tasklet context, this > adds unnecessary overhead. In a heavy TCP workload, we see 4% of CPU > time spent on the tasklet_trylock() in tasklet_action_common(), with a > smaller amount spent on the atomic operations in tasklet_schedule(), > tasklet_clear_sched(), and locking the spinlock in mlx5_cq_tasklet_cb(). > TCP completions are handled by mlx5e_completion_event(), which schedules > NAPI to poll the queue, so they don't need tasklet processing. > > Schedule the tasklet in mlx5_add_cq_to_tasklet() instead to avoid this > overhead. mlx5_add_cq_to_tasklet() is responsible for enqueuing the CQs > to be processed in tasklet context, so it can schedule the tasklet. CQs > that need tasklet processing have their interrupt comp handler set to > mlx5_add_cq_to_tasklet(), so they will schedule the tasklet. CQs that > don't need tasklet processing won't schedule the tasklet. To avoid > scheduling the tasklet multiple times during the same interrupt, only > schedule the tasklet in mlx5_add_cq_to_tasklet() if the tasklet work > queue was empty before the new CQ was pushed to it. > > The additional branch in mlx5_add_cq_to_tasklet(), called for each EQE, > may add a small cost for the userspace Infiniband CQs whose completions > are processed in tasklet context. But this seems worth it to avoid the > tasklet overhead for CQs that don't need it. > > Note that the mlx4 driver works the same way: it schedules the tasklet > in mlx4_add_cq_to_tasklet() and only if the work queue was empty before. > > Signed-off-by: Caleb Sander Mateos <csander@xxxxxxxxxxxxxxx> > Reviewed-by: Parav Pandit <parav@xxxxxxxxxx> @Saeed, @Leon, @Tariq: I assume you will apply this one and include in the next mlx5 PR. please correct me if I'm wrong. Thanks, Paolo