But that does bring up the fact if we should always be doing the nvme_process_cq(nvmeq) after IO submission. For direct/hipri IO, maybe it's better to make the submission path faster and skip it?Yes, I am okay to remove the opprotunistic nvme_process_cq in the submission path. Even under deeply queued IO, I've not seen this provide any measurable benefit.
+100 for removing it. Never really understood why it gets us anything...