Your report provided this stats with one-completion dominance for the single-threaded case. Does it also hold if you run multiple fio threads per core?
It's useless to run more threads on that core, it's already fully utilized. That single threads is already posting a fair amount of submissions, so I don't see how adding more fio jobs can help in any way. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html