Inspired by the recent fast path DIO patch from Daniel Ehrenberg. I spent some time micro optimizing the "slow" direct IO submission part. This should get rid of large parts of the memset and dio access costs that Dan noticed. I moved everything that isn't needed in the completion handler back into the stack, to make it more likely it's cache hot. It also inlines everything to allow the compiler to optimize more. In particular it can split up the sdio structure into individual variables now and then get rid of unnecessary initializations. This costs some text size, but I think it's worth for such a hot path. And the dio is a slab now, which avoids some fast path overhead. Dan, could you please test this patch in your test case, comparing against the fast path again? Please test with CONFIG_CC_OPTIMIZE_FOR_SIZE and CONFIG_OPTIMIZE_INLINING both disabled. Thanks, -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html