Re: [PATCH v2 2/2] io_uring: flush timeouts that should already have expired

Pavel Begunkov <asml.silence@xxxxxxxxx> · Thu, 14 Jan 2021 21:04:23 +0000

On 14/01/2021 00:46, Marcelo Diop-Gonzalez wrote:
> On Tue, Jan 12, 2021 at 08:47:11PM +0000, Pavel Begunkov wrote:
>> On 08/01/2021 15:57, Marcelo Diop-Gonzalez wrote:
>>> On Sat, Jan 02, 2021 at 08:26:26PM +0000, Pavel Begunkov wrote:
>>>> On 02/01/2021 19:54, Pavel Begunkov wrote:
>>>>> On 19/12/2020 19:15, Marcelo Diop-Gonzalez wrote:
>>>>>> Right now io_flush_timeouts() checks if the current number of events
>>>>>> is equal to ->timeout.target_seq, but this will miss some timeouts if
>>>>>> there have been more than 1 event added since the last time they were
>>>>>> flushed (possible in io_submit_flush_completions(), for example). Fix
>>>>>> it by recording the starting value of ->cached_cq_overflow -
>>>>>> ->cq_timeouts instead of the target value, so that we can safely
>>>>>> (without overflow problems) compare the number of events that have
>>>>>> happened with the number of events needed to trigger the timeout.
>>>>
>>>> https://www.spinics.net/lists/kernel/msg3475160.html
>>>>
>>>> The idea was to replace u32 cached_cq_tail with u64 while keeping
>>>> timeout offsets u32. Assuming that we won't ever hit ~2^62 inflight
>>>> requests, complete all requests falling into some large enough window
>>>> behind that u64 cached_cq_tail.
>>>>
>>>> simplifying:
>>>>
>>>> i64 d = target_off - ctx->u64_cq_tail
>>>> if (d <= 0 && d > -2^32)
>>>> 	complete_it()
>>>>
>>>> Not fond  of it, but at least worked at that time. You can try out
>>>> this approach if you want, but would be perfect if you would find
>>>> something more elegant :)
>>>>
>>>
>>> What do you think about something like this? I think it's not totally
>>> correct because it relies on having ->completion_lock in io_timeout() so
>>> that ->cq_last_tm_flushed is updated, but in case of IORING_SETUP_IOPOLL,
>>> io_iopoll_complete() doesn't take that lock, and ->uring_lock will not
>>> be held if io_timeout() is called from io_wq_submit_work(), but maybe
>>> could still be worth it since that was already possibly a problem?
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index cb57e0360fcb..50984709879c 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -353,6 +353,7 @@ struct io_ring_ctx {
>>>  		unsigned		cq_entries;
>>>  		unsigned		cq_mask;
>>>  		atomic_t		cq_timeouts;
>>> +		unsigned		cq_last_tm_flush;
>>
>> It looks like that "last flush" is a good direction.
>> I think there can be problems at extremes like completing 2^32
>> requests at once, but should be ok in practice. Anyway better
>> than it's now.
>>
>> What about the first patch about overflows and cq_timeouts? I
>> assume that problem is still there, isn't it?
>>
>> See comments below, but if it passes liburing tests, please send
>> a patch.
>>
>>>  		unsigned long		cq_check_overflow;
>>>  		struct wait_queue_head	cq_wait;
>>>  		struct fasync_struct	*cq_fasync;
>>> @@ -1633,19 +1634,26 @@ static void __io_queue_deferred(struct io_ring_ctx *ctx)
>>>  
>>>  static void io_flush_timeouts(struct io_ring_ctx *ctx)
>>>  {
>>> +	u32 seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
>>> +
>>
>> a nit, 
>>
>> if (list_empty()) return; + do {} while();
> 
> Ah btw, so then we would have to add ->last_flush = seq in
> io_timeout() too? I think that should be correct but just wanna make
> sure that's what you meant.  Because otherwise if list_empty() is true
> for a while without updating ->last_flush then there could be
> problems. Like for example if there are no timeouts for a while, and
> seq == 2^32-2, then we add a timeout with off == 4. If last_flush is
> still 0 then target-last_flush == 2, but seq - last_flush == 2^32-2

You've just answered your question :) You need to update it somehow,
either unconditionally in commit, or in io_timeout(), or anyhow else.

btw, I like your idea to do it in io_timeout(), because it adds a
timeout anyway, so makes that list_empty() fails and kind of
automatically pushes all that tracking.

-- 
Pavel Begunkov