Here is a series of perf improvements and debug/trace fixes from Mike, who has this to say about the patches... The AIP SDMA interrupt handling is inefficient: - A slab entry is allocated for each sent packet This is despite the fact that there is a ring for each possible send slot that could be occupied by a tx descriptor - The interrupt handling/NAPI is lock happy has a mixed up notion of producer and consumer The ring should be a ring of tx descriptors vs. a ring of pointers The consumer of descriptors should be the xmit side of the TX The producer of the descriptors is the SDMA interrupt handling and NAPI tx completion There is certainly no locking required in the interrupt/TX napi tx queue There is no locking required in the xmit side since that is held off by NAPI code Note that these patches are also staged publicly on our GitHub site for easy browsing in context. https://github.com/cornelisnetworks/linux --- Mike Marciniszyn (6): IB/hfi1: Remove cache and embed txreq in ring IB/hfi1: Get rid of hot path divide IB/hfi1: Get rid of tx priv backpointer IB/hfi1: Tune netdev xmit cachelines IB/hfi1: Remove atomic completion count IB/hfi1: Add ring consumer and producers traces drivers/infiniband/hw/hfi1/ipoib.h | 76 +++++--- drivers/infiniband/hw/hfi1/ipoib_tx.c | 314 ++++++++++++++------------------- drivers/infiniband/hw/hfi1/trace_tx.h | 71 +++++++ 3 files changed, 246 insertions(+), 215 deletions(-) -- -Denny