Here's v2 of this patchset. It should be solid now, the previous version didn't handle tag flushing correctly, or multiple hardware queue types. The idea here is that we can reduce the cost of getting a tag for a new request, if we don't get them piecemeal. Add a per-ctx tag cache, and grab batches of tags if it's empty. If it's not empty, we can just find a free bit there. /sys/kernel/debug/block/<dev>/<hctx>/<cpu>/tag_hit holds some stats associated with this, so you can check how it's doing. I've seen nice improvements with this in testing. -- Jens Axboe