On 12/8/23 3:24 AM, Tobias Huschle wrote: > On Thu, Dec 07, 2023 at 01:48:40AM -0500, Michael S. Tsirkin wrote: >> On Thu, Dec 07, 2023 at 07:22:12AM +0100, Tobias Huschle wrote: >>> 3. vhost looping endlessly, waiting for kworker to be scheduled >>> >>> I dug a little deeper on what the vhost is doing. I'm not an expert on >>> virtio whatsoever, so these are just educated guesses that maybe >>> someone can verify/correct. Please bear with me probably messing up >>> the terminology. >>> >>> - vhost is looping through available queues. >>> - vhost wants to wake up a kworker to process a found queue. >>> - kworker does something with that queue and terminates quickly. >>> >>> What I found by throwing in some very noisy trace statements was that, >>> if the kworker is not woken up, the vhost just keeps looping accross >>> all available queues (and seems to repeat itself). So it essentially >>> relies on the scheduler to schedule the kworker fast enough. Otherwise >>> it will just keep on looping until it is migrated off the CPU. >> >> >> Normally it takes the buffers off the queue and is done with it. >> I am guessing that at the same time guest is running on some other >> CPU and keeps adding available buffers? >> > > It seems to do just that, there are multiple other vhost instances > involved which might keep filling up thoses queues. > > Unfortunately, this makes the problematic vhost instance to stay on > the CPU and prevents said kworker to get scheduled. The kworker is > explicitly woken up by vhost, so it wants it to do something. > > At this point it seems that there is an assumption about the scheduler > in place which is no longer fulfilled by EEVDF. From the discussion so > far, it seems like EEVDF does what is intended to do. > > Shouldn't there be a more explicit mechanism in use that allows the > kworker to be scheduled in favor of the vhost? > > It is also concerning that the vhost seems cannot be preempted by the > scheduler while executing that loop. > Hey, I recently noticed this change: commit 05bfb338fa8dd40b008ce443e397fc374f6bd107 Author: Josh Poimboeuf <jpoimboe@xxxxxxxxxx> Date: Fri Feb 24 08:50:01 2023 -0800 vhost: Fix livepatch timeouts in vhost_worker() We used to do: while (1) for each vhost work item in list execute work item if (need_resched()) schedule(); and after that patch we do: while (1) for each vhost work item in list execute work item cond_resched() Would the need_resched check we used to have give you what you wanted?