On Tue, Jan 16, 2024 at 09:11:32PM +0800, Heng Qi wrote: > Currently, when each time the driver attempts to update the coalescing > parameters for a vq, it needs to kick the device. > The following path is observed: > 1. Driver kicks the device; > 2. After the device receives the kick, CPU scheduling occurs and DMA > multiple buffers multiple times; > 3. The device completes processing and replies with a response. > > When large-queue devices issue multiple requests and kick the device > frequently, this often interrupt the work of the device-side CPU. > In addition, each vq request is processed separately, causing more > delays for the CPU to wait for the DMA request to complete. > > These interruptions and overhead will strain the CPU responsible for > controlling the path of the DPU, especially in multi-device and > large-queue scenarios. > > To solve the above problems, we internally tried batch request, > which merges requests from multiple queues and sends them at once. > We conservatively tested 8 queue commands and sent them together. > The DPU processing efficiency can be improved by 8 times, which > greatly eases the DPU's support for multi-device and multi-queue DIM. > > Suggested-by: Xiaoming Zhao <zxm377917@xxxxxxxxxxxxxxx> > Signed-off-by: Heng Qi <hengqi@xxxxxxxxxxxxxxxxx> ... > @@ -3546,16 +3552,32 @@ static void virtnet_rx_dim_work(struct work_struct *work) > update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix); > if (update_moder.usec != rq->intr_coal.max_usecs || > update_moder.pkts != rq->intr_coal.max_packets) { > - err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum, > - update_moder.usec, > - update_moder.pkts); > - if (err) > - pr_debug("%s: Failed to send dim parameters on rxq%d\n", > - dev->name, qnum); > - dim->state = DIM_START_MEASURE; > + coal->coal_vqs[j].vqn = cpu_to_le16(rxq2vq(i)); > + coal->coal_vqs[j].coal.max_usecs = cpu_to_le32(update_moder.usec); > + coal->coal_vqs[j].coal.max_packets = cpu_to_le32(update_moder.pkts); > + rq->intr_coal.max_usecs = update_moder.usec; > + rq->intr_coal.max_packets = update_moder.pkts; > + j++; > } > } > > + if (!j) > + goto ret; > + > + coal->num_entries = cpu_to_le32(j); > + sg_init_one(&sgs, coal, sizeof(struct virtnet_batch_coal) + > + j * sizeof(struct virtio_net_ctrl_coal_vq)); > + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL, > + VIRTIO_NET_CTRL_NOTF_COAL_VQS_SET, > + &sgs)) > + dev_warn(&vi->vdev->dev, "Failed to add dim command\n."); > + > + for (i = 0; i < j; i++) { > + rq = &vi->rq[(coal->coal_vqs[i].vqn) / 2]; Hi Heng Qi, The type of .vqn is __le16, but here it is used as an integer in host byte order. Perhaps this should be (completely untested!): rq = &vi->rq[le16_to_cpu(coal->coal_vqs[i].vqn) / 2]; > + rq->dim.state = DIM_START_MEASURE; > + } > + > +ret: > rtnl_unlock(); > } >