On Mon, Feb 10 2014, Christoph Hellwig wrote: > > I also think we should be getting more utility out of threading > > guarantees. So, if there's only one thread active per device we don't > > need any device counters to be atomic. Likewise, u32 read/write is an > > atomic operation, so we might be able to use sloppy counters for the > > target and host stuff (one per CPU that are incremented/decremented on > > that CPU ... this will only work using CPU locality ... completion on > > same CPU but that seems to be an element of a lot of stuff nowadays). > > The blk-mq code is aiming for CPU locality, but there are no hard > guarantees. I'm also not sure always bouncing around the I/O submission > is a win, but it might be something to play around with at the block > layer. > > Jens, did you try something like this earlier? Nope, I've always thought that if you needed to bounce submission around, you would already have lost. Hopefully we're moving to a model where you at least have X completion queues and can tell the hardware where you want the completion. You'd be a lot better off just placing the tasks differently, for the cases where you are not on the right node. If we're talking about shoving to a dedicated thread to avoid all the locking, that's going to hurt you on the sync workloads as well. And depending on your device and peak load, it'll kill you on the peak performance as well. That's why blk-mq was designed to handle parallel activity more efficiently. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html