On 14/01/2021 00:46, Marcelo Diop-Gonzalez wrote: > On Tue, Jan 12, 2021 at 08:47:11PM +0000, Pavel Begunkov wrote: >> On 08/01/2021 15:57, Marcelo Diop-Gonzalez wrote: >>> On Sat, Jan 02, 2021 at 08:26:26PM +0000, Pavel Begunkov wrote: >>>> On 02/01/2021 19:54, Pavel Begunkov wrote: >>>>> On 19/12/2020 19:15, Marcelo Diop-Gonzalez wrote: >>>>>> Right now io_flush_timeouts() checks if the current number of events >>>>>> is equal to ->timeout.target_seq, but this will miss some timeouts if >>>>>> there have been more than 1 event added since the last time they were >>>>>> flushed (possible in io_submit_flush_completions(), for example). Fix >>>>>> it by recording the starting value of ->cached_cq_overflow - >>>>>> ->cq_timeouts instead of the target value, so that we can safely >>>>>> (without overflow problems) compare the number of events that have >>>>>> happened with the number of events needed to trigger the timeout. >>>> >>>> https://www.spinics.net/lists/kernel/msg3475160.html >>>> >>>> The idea was to replace u32 cached_cq_tail with u64 while keeping >>>> timeout offsets u32. Assuming that we won't ever hit ~2^62 inflight >>>> requests, complete all requests falling into some large enough window >>>> behind that u64 cached_cq_tail. >>>> >>>> simplifying: >>>> >>>> i64 d = target_off - ctx->u64_cq_tail >>>> if (d <= 0 && d > -2^32) >>>> complete_it() >>>> >>>> Not fond of it, but at least worked at that time. You can try out >>>> this approach if you want, but would be perfect if you would find >>>> something more elegant :) >>>> >>> >>> What do you think about something like this? I think it's not totally >>> correct because it relies on having ->completion_lock in io_timeout() so >>> that ->cq_last_tm_flushed is updated, but in case of IORING_SETUP_IOPOLL, >>> io_iopoll_complete() doesn't take that lock, and ->uring_lock will not >>> be held if io_timeout() is called from io_wq_submit_work(), but maybe >>> could still be worth it since that was already possibly a problem? >>> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index cb57e0360fcb..50984709879c 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -353,6 +353,7 @@ struct io_ring_ctx { >>> unsigned cq_entries; >>> unsigned cq_mask; >>> atomic_t cq_timeouts; >>> + unsigned cq_last_tm_flush; >> >> It looks like that "last flush" is a good direction. >> I think there can be problems at extremes like completing 2^32 >> requests at once, but should be ok in practice. Anyway better >> than it's now. >> >> What about the first patch about overflows and cq_timeouts? I >> assume that problem is still there, isn't it? >> >> See comments below, but if it passes liburing tests, please send >> a patch. >> >>> unsigned long cq_check_overflow; >>> struct wait_queue_head cq_wait; >>> struct fasync_struct *cq_fasync; >>> @@ -1633,19 +1634,26 @@ static void __io_queue_deferred(struct io_ring_ctx *ctx) >>> >>> static void io_flush_timeouts(struct io_ring_ctx *ctx) >>> { >>> + u32 seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts); >>> + >> >> a nit, >> >> if (list_empty()) return; + do {} while(); > > Ah btw, so then we would have to add ->last_flush = seq in > io_timeout() too? I think that should be correct but just wanna make > sure that's what you meant. Because otherwise if list_empty() is true > for a while without updating ->last_flush then there could be > problems. Like for example if there are no timeouts for a while, and > seq == 2^32-2, then we add a timeout with off == 4. If last_flush is > still 0 then target-last_flush == 2, but seq - last_flush == 2^32-2 You've just answered your question :) You need to update it somehow, either unconditionally in commit, or in io_timeout(), or anyhow else. btw, I like your idea to do it in io_timeout(), because it adds a timeout anyway, so makes that list_empty() fails and kind of automatically pushes all that tracking. -- Pavel Begunkov