On 27/11/2019 16:52, Hannes Reinecke wrote:
On 11/27/19 3:44 PM, John Garry wrote:
On 27/11/2019 14:21, Jens Axboe wrote:
On 11/27/19 6:05 AM, John Garry wrote:
On 27/11/2019 01:46, Jens Axboe wrote:
Would be interesting to check the generated code for that,
ideally we'd
get rid of the extra load for that case, even if it is in the same
cacheline.
I checked the disassembly and we still have the load instead of
the add.
This is not surprising, as the compiler would not know for certain
that
we point to a field within the same struct. But at least we still
should
point to a close memory.
Note that the pointer could be dropped, which would remove the
load, but
then we have many if-elses which could be slower, not to mention that
the blk-mq-tag code deals in bitmap pointers anyway.
Hi Jens,
It might still be worthwhile to do:
if (tags->ptr == &tags->__default)
foo(&tags->__default);
to make it clear, as that branch will predict easily.
Not sure. So this code does produce the same assembly, as we still need
to do the tags->ptr load for the comparison.
Hi Jens,
How can it be the same? The approach in the patchset needs to load
*tags->ptr, this one needs tags->ptr. That's the big difference.
In the patch for this thread, we have:
@@ -121,10 +121,10 @@ unsigned int blk_mq_get_tag(struct
blk_mq_alloc_data *data)
WARN_ON_ONCE(1);
return BLK_MQ_TAG_FAIL;
}
- bt = &tags->breserved_tags;
+ bt = tags->breserved_tags;
tag_offset = 0;
} else {
- bt = &tags->bitmap_tags;
+ bt = tags->bitmap_tags;
tag_offset = tags->nr_reserved_tags;
}
So current code gets bt pointer by simply offsetting a certain
distance from tags pointer - that is the add I mention.
With the change in this patch, we need to load memory at address
&tags->bitmap_tags to get bt - this is the load I mention.
So for this:
if (tags->ptr == &tags->__default)
We load &tags->ptr to get the pointer value for comparison vs
&tags->__default.
There must be something I'm missing...
The point here was that the load might refer to _other_ memory locations
(as it's being allocated separately),
I think that we're talking about something different.
thus incurring a cache miss.
With embedded tag bitmaps we'll load from the same cache line
(hopefully), and won't get a performance hit.
But I'll just wait to see what you come up with.
Thanks,
John