On Wed, Dec 15, 2021 at 09:24:21AM -0700, Jens Axboe wrote: > + spin_lock(&nvmeq->sq_lock); > + while (!rq_list_empty(*rqlist)) { > + struct request *req = rq_list_pop(rqlist); > + struct nvme_iod *iod = blk_mq_rq_to_pdu(req); > + > + memcpy(nvmeq->sq_cmds + (nvmeq->sq_tail << nvmeq->sqes), > + absolute_pointer(&iod->cmd), sizeof(iod->cmd)); > + if (++nvmeq->sq_tail == nvmeq->q_depth) > + nvmeq->sq_tail = 0; So this doesn't even use the new helper added in patch 2? I think this should call nvme_sq_copy_cmd(). The rest looks identical to the incremental patch I posted, so I guess the performance degration measured on the first try was a measurement error?