On Wed, Dec 13, 2023 at 11:37:23AM +0100, Tobias Huschle wrote: > On Tue, Dec 12, 2023 at 11:15:01AM -0500, Michael S. Tsirkin wrote: > > On Tue, Dec 12, 2023 at 11:00:12AM +0800, Jason Wang wrote: > > > On Tue, Dec 12, 2023 at 12:54 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > We played around with the suggestions and some other ideas. > I would like to share some initial results. > > We tried the following: > > 1. Call uncondtional schedule in the vhost_worker function > 2. Change the HZ value from 100 to 1000 > 3. Reverting 05bfb338fa8d vhost: Fix livepatch timeouts in vhost_worker() > 4. Adding a cond_resched to translate_desc > 5. Reducing VHOST_NET_WEIGHT to 25% of its original value > > Please find the diffs below. > > Summary: > > Option 1 is very very hacky but resolved the regression. > Option 2 reduces the regression by ~20%. > Options 3-5 do not help unfortunately. > > Potential explanation: > > While the vhost is executing, the need_resched flag is not set (observable > in the traces). Therefore cond_resched and alike will do nothing. vhost > will continue executing until the need_resched flag is set by an external > party, e.g. by a request to migrate the vhost. > > Calling schedule unconditionally forces the scheduler to re-evaluate all > tasks and their vruntime/deadline/vlag values. The scheduler comes to the > correct conclusion, that the kworker should be executed and from there it > is smooth sailing. I will have to verify that sequence by collecting more > traces, but this seems rather plausible. > This hack might of course introduce all kinds of side effects but might > provide an indicator that this is the actual problem. > The big question would be how to solve this conceptually, and, first > things first, whether you think this is a viable hypothesis. > > Increasing the HZ value helps most likely because the other CPUs take > scheduling/load balancing decisions more often as well and therefore > trigger the migration faster. > > Bringing down VHOST_NET_WEIGHT even more might also help to shorten the > vhost loop. But I have no intuition how low we can/should go here. > > > We also changed vq_err to print error messages, but did not encounter any. > > Diffs: > -------------------------------------------------------------------------- > > 1. Call uncondtional schedule in the vhost_worker function > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index e0c181ad17e3..16d73fd28831 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -414,6 +414,7 @@ static bool vhost_worker(void *data) > } > } > > + schedule(); > return !!node; > } So, this helps. But this is very surprising! static int vhost_task_fn(void *data) { struct vhost_task *vtsk = data; bool dead = false; for (;;) { bool did_work; if (!dead && signal_pending(current)) { struct ksignal ksig; /* * Calling get_signal will block in SIGSTOP, * or clear fatal_signal_pending, but remember * what was set. * * This thread won't actually exit until all * of the file descriptors are closed, and * the release function is called. */ dead = get_signal(&ksig); if (dead) clear_thread_flag(TIF_SIGPENDING); } /* mb paired w/ vhost_task_stop */ set_current_state(TASK_INTERRUPTIBLE); if (test_bit(VHOST_TASK_FLAGS_STOP, &vtsk->flags)) { __set_current_state(TASK_RUNNING); break; } did_work = vtsk->fn(vtsk->data); if (!did_work) schedule(); } complete(&vtsk->exited); do_exit(0); } Apparently schedule is already called? -- MST