On 11/27/19 2:05 PM, John Garry wrote: > On 27/11/2019 01:46, Jens Axboe wrote: >>>> Would be interesting to check the generated code for that, ideally we'd >>>> get rid of the extra load for that case, even if it is in the same >>>> cacheline. >>>> >>> I checked the disassembly and we still have the load instead of the add. >>> >>> This is not surprising, as the compiler would not know for certain that >>> we point to a field within the same struct. But at least we still should >>> point to a close memory. >>> >>> Note that the pointer could be dropped, which would remove the load, but >>> then we have many if-elses which could be slower, not to mention that >>> the blk-mq-tag code deals in bitmap pointers anyway. > > Hi Jens, > >> It might still be worthwhile to do: >> >> if (tags->ptr == &tags->__default) >> foo(&tags->__default); >> >> to make it clear, as that branch will predict easily. > > Not sure. So this code does produce the same assembly, as we still need > to do the tags->ptr load for the comparison. > > And then if you consider blk_mq_get_tags() as an example, there is no > other hot value available to indicate whether the shared tags are used > to decide whether to use &tags->__default. > > I'll consider it more. > After talking to our compiler folks I guess it should be possible to resurrect your original patch (with both the embedded bitmap and the point to the bitmap), linking the pointer to the embedded bitmap in the non-shared case. Then we can access the pointer in all cases, but in the non-shared case that will then point to the embedded one, thus avoiding the possible cache miss. I'll see how that goes. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer