On 3/11/24 17:05, Michael S. Tsirkin wrote: > On Thu, Feb 01, 2024 at 12:47:39PM +0100, Tobias Huschle wrote: >> On Thu, Feb 01, 2024 at 03:08:07AM -0500, Michael S. Tsirkin wrote: >>> On Thu, Feb 01, 2024 at 08:38:43AM +0100, Tobias Huschle wrote: >>>> On Sun, Jan 21, 2024 at 01:44:32PM -0500, Michael S. Tsirkin wrote: >>>>> On Mon, Jan 08, 2024 at 02:13:25PM +0100, Tobias Huschle wrote: >>>>>> On Thu, Dec 14, 2023 at 02:14:59AM -0500, Michael S. Tsirkin wrote: >>>> >>>> -------- Summary -------- >>>> >>>> In my (non-vhost experience) opinion the way to go would be either >>>> replacing the cond_resched with a hard schedule or setting the >>>> need_resched flag within vhost if the a data transfer was successfully >>>> initiated. It will be necessary to check if this causes problems with >>>> other workloads/benchmarks. >>> >>> Yes but conceptually I am still in the dark on whether the fact that >>> periodically invoking cond_resched is no longer sufficient to be nice to >>> others is a bug, or intentional. So you feel it is intentional? >> >> I would assume that cond_resched is still a valid concept. >> But, in this particular scenario we have the following problem: >> >> So far (with CFS) we had: >> 1. vhost initiates data transfer >> 2. kworker is woken up >> 3. CFS gives priority to woken up task and schedules it >> 4. kworker runs >> >> Now (with EEVDF) we have: >> 0. In some cases, kworker has accumulated negative lag >> 1. vhost initiates data transfer >> 2. kworker is woken up >> -3a. EEVDF does not schedule kworker if it has negative lag >> -4a. vhost continues running, kworker on same CPU starves >> -- >> -3b. EEVDF schedules kworker if it has positive or no lag >> -4b. kworker runs >> >> In the 3a/4a case, the kworker is given no chance to set the >> necessary flag. The flag can only be set by another CPU now. >> The schedule of the kworker was not caused by cond_resched, but >> rather by the wakeup path of the scheduler. >> >> cond_resched works successfully once the load balancer (I suppose) >> decides to migrate the vhost off to another CPU. In that case, the >> load balancer on another CPU sets that flag and we are good. >> That then eventually allows the scheduler to pick kworker, but very >> late. > > Are we going anywhere with this btw? > > I think Tobias had a couple other threads related to this, with other potential fixes: https://lore.kernel.org/lkml/20240228161018.14253-1-huschle@xxxxxxxxxxxxx/ https://lore.kernel.org/lkml/20240228161023.14310-1-huschle@xxxxxxxxxxxxx/