On Mon, 2023-06-26 at 16:57 -0700, longli@xxxxxxxxxxxxxxxxx wrote: > From: Long Li <longli@xxxxxxxxxxxxx> > > It's inefficient to ring the doorbell page every time a WQE is posted to > the received queue. Excessive MMIO writes result in CPU spending more > time waiting on LOCK instructions (atomic operations), resulting in > poor scaling performance. > > Move the code for ringing doorbell page to where after we have posted all > WQEs to the receive queue during a callback from napi_poll(). > > With this change, tests showed an improvement from 120G/s to 160G/s on a > 200G physical link, with 16 or 32 hardware queues. > > Tests showed no regression in network latency benchmarks on single > connection. > > While we are making changes in this code path, change the code for > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > hardware specification specifies that it should set to 0. Although > currently the hardware doesn't enforce the check, in the future releases > it may do. > > Cc: stable@xxxxxxxxxxxxxxx > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Uhmmm... this looks like a performance improvement to me, more suitable for the net-next tree ?!? (Note that net-next is closed now). In any case you must avoid empty lines in the tag area. If you really intend targeting the -net tree, please repost fixing the above and explicitly specifying the target tree in the subj prefix. thanks! Paolo