On 11/12/2020 03:37, Keith Busch wrote: > On Fri, Dec 11, 2020 at 01:44:38AM +0000, Pavel Begunkov wrote: >> On 11/12/2020 01:19, Andres Freund wrote: >>> On 2020-12-10 23:15:15 +0000, Pavel Begunkov wrote: >>>> On 10/12/2020 23:12, Pavel Begunkov wrote: >>>>> On 10/12/2020 20:51, Andres Freund wrote: >>>>>> Hi, >>>>>> >>>>>> When using hybrid polling (i.e echo 0 > >>>>>> /sys/block/nvme1n1/queue/io_poll_delay) I see stalls with fio when using >>>>>> an iodepth > 1. Sometimes fio hangs, other times the performance is >>>>>> really poor. I reproduced this with SSDs from different vendors. >>>>> >>>>> Can you get poll stats from debugfs while running with hybrid? >>>>> For both iodepth=1 and 32. >>>> >>>> Even better if for 32 you would show it in dynamic, i.e. cat it several >>>> times while running it. >>> >>> Should read all email before responding... >>> >>> This is a loop of grepping for 4k writes (only type I am doing), with 1s >>> interval. I started it before the fio run (after one with >>> iodepth=1). Once the iodepth 32 run finished (--timeout 10, but took >>> 42s0, I started a --iodepth 1 run. >> >> Thanks! Your mean grows to more than 30s, so it'll sleep for 15s for each >> IO. Yep, the sleep time calculation is clearly broken for you. >> >> In general the current hybrid polling doesn't work well with high QD, >> that's because statistics it based on are not very resilient to all sorts >> of problems. And it might be a problem I described long ago >> >> https://www.spinics.net/lists/linux-block/msg61479.html >> https://lkml.org/lkml/2019/4/30/120 > > It sounds like the statistic is using the wrong criteria. It ought to > use the average time for the next available completion for any request > rather than the average latency of a specific IO. It might work at high > depth if the hybrid poll knew the hctx's depth when calculating the > sleep time, but that information doesn't appear to be readily available. It polls (and so sleeps) from submission of a request to its completion, not from request to request. Looks like the other scheme doesn't suit well when you don't have a constant-ish flow of requests, e.g. QD=1 and with different latency in the userspace. -- Pavel Begunkov