> Il giorno 13 ago 2021, alle ore 16:01, Jan Kara <jack@xxxxxxx> ha scritto: > > Hi Paolo! > > On Thu 20-05-21 17:05:45, Paolo Valente wrote: >>> Il giorno 5 mag 2021, alle ore 18:20, Jan Kara <jack@xxxxxxx> ha scritto: >>> >>> Hi Paolo! >>> >>> I have two processes doing direct IO writes like: >>> >>> dd if=/dev/zero of=/mnt/file$i bs=128k oflag=direct count=4000M >>> >>> Now each of these processes belongs to a different cgroup and it has >>> different bfq.weight. I was looking into why these processes do not split >>> bandwidth according to BFQ weights. Or actually the bandwidth is split >>> accordingly initially but eventually degrades into 50/50 split. After some >>> debugging I've found out that due to luck, one of the processes is decided >>> to be a waker of the other process and at that point we loose isolation >>> between the two cgroups. This pretty reliably happens sometime during the >>> run of these two processes on my test VM. So can we tweak the waker logic >>> to reduce the chances for false positives? Essentially when there are only >>> two processes doing heavy IO against the device, the logic in >>> bfq_check_waker() is such that they are very likely to eventually become >>> wakers of one another. AFAICT the only condition that needs to get >>> fulfilled is that they need to submit IO within 4 ms of the completion of >>> IO of the other process 3 times. >>> >> >> Hi Jan! >> as I happened to tell you moths ago, I feared some likely cover case >> to show up eventually. Actually, I was even more pessimistic than how >> reality proved to be :) >> >> I'm sorry for my delay, but I've had to think about this issue for a >> while. Being too strict would easily run out journald as a waker for >> processes belonging to a different group. >> >> So, what do you think of this proposal: add the extra filter that a >> waker must belong to the same group of the woken, or, at most, to the >> root group? > > Returning back to this :). I've been debugging other BFQ problems with IO > priorities not really leading to service differentiation (mostly because > scheduler tag exhaustion, false waker detection, and how we inject IO for a > waker) and as a result I have come up with a couple of patches that also > address this issue as a side effect - I've added an upper time limit > (128*slice_idle) for the "third cooperation" detection and that mostly got > rid of these false waker detections. Great! > We could fail to detect waker-wakee > processes if they do not cooperate frequently but then the value of the > detection is small and the lack of isolation may do more harm than good > anyway. > IIRC, dbench was our best benchmark for checking whether the detection is (still) effective. > Currently I'm running wider set of benchmarks for the patches to see > whether I didn't regress anything else. If not, I'll post the patches to > the list. > Any news? Thanks, Paolo > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR