Re: [PATCH RFC] io_uring/rsrc: add last-lookup cache hit to io_rsrc_node_lookup()

Jens Axboe <axboe@xxxxxxxxx> · Wed, 30 Oct 2024 14:52:46 -0600

On 10/30/24 2:25 PM, Jens Axboe wrote:
> On 10/30/24 11:20 AM, Jann Horn wrote:
>> On Wed, Oct 30, 2024 at 5:58?PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>> This avoids array_index_nospec() for repeated lookups on the same node,
>>> which can be quite common (and costly). If a cached node is removed from
>>
>> You're saying array_index_nospec() can be quite costly - which
>> architecture is this on? Is this the cost of the compare+subtract+and
>> making the critical path longer?
> 
> Tested this on arm64, in a vm to be specific. Let me try and generate
> some numbers/profiles on x86-64 as well. It's noticeable there as well,
> though not quite as bad as the below example. For arm64, with the patch,
> we get roughly 8.7% of the time spent getting a resource - without it's
> 66% of the time. This is just doing a microbenchmark, but it clearly
> shows that anything following the barrier on arm64 is very costly:
> 
>   0.98 ?       ldr   x21, [x0, #96]
>        ?     ? tbnz  w2, #1, b8
>   1.04 ?       ldr   w1, [x21, #144]
>        ?       cmp   w1, w19
>        ?     ? b.ls  a0
>        ? 30:   mov   w1, w1
>        ?       sxtw  x0, w19
>        ?       cmp   x0, x1
>        ?       ngc   x0, xzr
>        ?       csdb
>        ?       ldr   x1, [x21, #160]
>        ?       and   w19, w19, w0
>  93.98 ?       ldr   x19, [x1, w19, sxtw #3]
> 
> and accounts for most of that 66% of the total cost of the micro bench,
> even though it's doing a ton more stuff than simple getting this node
> via a lookup.

Ran some x86-64 testing, and there's no such effect on x86-64. So mostly
useful on archs with more expensive array_index_nospec(). There's
obviously a cost associated with it, but it's more of an even trade off
in terms of having the extra branch vs the nospec indexing. Which means
at that point you may as well not add the extra cache, as this
particular case always hits it, and hence it's a best case kind of test.

-- 
Jens Axboe