On Fri, Oct 28, 2016 at 4:22 PM, Jens Axboe <axboe@xxxxxxxxx> wrote: > On 10/28/2016 03:32 AM, Linus Walleij wrote: >> >> This is without using Bartlomiej's clever hack to pretend we have >> 2 elements in the HW queue though. His early tests indicate that >> it doesn't help much: the performance regression we see is due to >> lack of block scheduling. > > A simple dd test, I don't see how that can be slower due to lack of > scheduling. There's nothing to schedule there, just issue them in order? Yeah I guess you're right, I guess it could be in part to not having activated front- and back-end merges properly as Christoph pointed out, I'll look closer at this. > So that would probably be where I would start looking. A blktrace of the > in-kernel code and the blk-mq enabled code would perhaps be > enlightening. I don't think it's worth looking at the more complex test > cases until the dd test case is at least as fast as the non-mq version. Yeah. > Was that with CFQ, btw, or what scheduler did it run? CFQ, just plain defconfig. > It'd be nice to NOT have to rely on that fake QD=2 setup, since it will > mess with the IO scheduling as well. I agree. >> I try to find a way forward with this, and also massage the MMC/SD >> code to be more MQ friendly to begin with (like only pick requests >> when we get a request notification and stop pulling NULL requests >> off the queue) but it's really a messy piece of code. > > Yeah, it does look pretty messy... I'd be happy to help out with that, > and particularly in figuring out why the direct conversion is slower for > a basic 'dd' test case. I'm looking into it. Yours, Linus Walleij -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html