Re: [PATCH next v1 2/2] io_uring: limit local tw done

David Wei <dw@xxxxxxxxxxx> · Thu, 21 Nov 2024 09:53:59 -0800

On 2024-11-21 07:07, Pavel Begunkov wrote:
> On 11/21/24 14:31, Jens Axboe wrote:
>> On 11/21/24 7:25 AM, Pavel Begunkov wrote:
>>> On 11/21/24 01:12, Jens Axboe wrote:
>>>> On 11/20/24 4:56 PM, Pavel Begunkov wrote:
>>>>> On 11/20/24 22:14, David Wei wrote:
> ...
>>>> I think that can only work if we change work_llist to be a regular list
>>>> with regular locking. Otherwise it's a bit of a mess with the list being
>>>
>>> Dylan once measured the overhead of locks vs atomics in this
>>> path for some artificial case, we can pull the numbers up.
>>
>> I did it more recently if you'll remember, actually posted a patch I
>> think a few months ago changing it to that. But even that approach adds
> 
> Right, and it's be a separate topic from this set.
> 
>> extra overhead, if you want to add it to the same list as now you need
> 
> Extra overhead to the retry path, which is not the hot path,
> and coldness of it is uncertain.
> 
>> to re-grab (and re-disable interrupts) the lock to add it back. My gut
>> says that would be _worse_ than the current approach. And if you keep a
>> separate list instead, well then you're back to identical overhead in
>> terms of now needing to check both when needing to know if anything is
>> pending, and checking both when running it.
>>
>>>> reordered, and then you're spending extra cycles on potentially
>>>> reordering all the entries again.
>>>
>>> That sucks, I agree, but then it's same question of how often
>>> it happens.
>>
>> At least for now, there's a real issue reported and we should fix it. I
>> think the current patches are fine in that regard. That doesn't mean we
>> can't potentially make it better, we should certainly investigate that.
>> But I don't see the current patches as being suboptimal really, they are
>> definitely good enough as-is for solving the issue.
> 
> That's fair enough, but I still would love to know how frequent
> it is. There is no purpose in optimising it as hot/slow path if
> it triggers every fifth run or such. David, how easy it is to
> get some stats? We can hack up some bpftrace script
> 

Here is a sample distribution of how many task work is done per
__io_run_local_work():

@work_done:
[1]             15385954  |@                                                   |
[2, 4)          33424809  |@@@@                                                |
[4, 8)          196055270 |@@@@@@@@@@@@@@@@@@@@@@@@                            |
[8, 16)         419060191 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[16, 32)        48395043  |@@@@@@                                              |
[32, 64)         1573469  |                                                    |
[64, 128)          98151  |                                                    |
[128, 256)         14288  |                                                    |
[256, 512)          2035  |                                                    |
[512, 1K)            268  |                                                    |
[1K, 2K)              13  |                                                    |

This workload had wait_nr set to 20 and the timeout set to 500 µs.

Empirically, I know that any task work done > 50 will violate the
latency limit for this workload. In these cases, all the requests must
be dropped. So even if excessive task work happens in a small % of time,
the impact is far larger than this.