On 11/26/19 10:23 AM, John Garry wrote: > On 26/11/2019 17:11, Jens Axboe wrote: >> On 11/26/19 9:54 AM, John Garry wrote: >>> On 26/11/2019 15:14, Jens Axboe wrote: >>>> On 11/26/19 2:14 AM, Hannes Reinecke wrote: >>>>> Instead of allocating the tag bitmap in place we should be using a >>>>> pointer. This is in preparation for shared host-wide bitmaps. >>>> >>>> Not a huge fan of this, it's an extra indirection in the hot path >>>> of both submission and completion. >>> >>> Hi Jens, >>> >>> Thanks for having a look. >>> >>> I checked the disassembly for blk_mq_get_tag() as a sample - which I >>> assume is one hot path function which you care about - and the cost of >>> the indirection is a load instruction instead of an add, denoted by ***, >>> below: >> > > Hi Jens, > >> I'm not that worried about an extra instruction, my worry is the extra >> load is from different memory. When it's embedded in the struct, we're >> on the same cache line or adjacent. > > Right, so the earlier iteration of this series kept the embedded struct > and we simply pointed at that, so I wouldn't expect a caching issue of > different memory in that case. That would be a much better solution for the common case, my concern here is slowing down the fast path for device that don't need shared tags. Would be interesting to check the generated code for that, ideally we'd get rid of the extra load for that case, even if it is in the same cacheline. -- Jens Axboe