Hi, We can do better than implementing batch tag allocs as just repeated calls into sbitmap. Add a sbitmap helper to grab a batch all at once, and use that instead. Testing with instrumentation added, we get very close to the full batch count. For NVMe, if I run with 32 batch submits, the actual success batch size is ~31 on average. This is close to ideal, as one hw queue will have a 63 tag size and hence we get 31 of 32 tags once every 1/32 alloc. This could be improved, but wasting the extra cycles in sbitmap to skip to the next index for that case doesn't seem worth it. -- Jens Axboe