On Tue, Dec 12, 2023 at 11:15:01AM -0500, Michael S. Tsirkin wrote: > On Tue, Dec 12, 2023 at 11:00:12AM +0800, Jason Wang wrote: > > On Tue, Dec 12, 2023 at 12:54 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: We played around with the suggestions and some other ideas. I would like to share some initial results. We tried the following: 1. Call uncondtional schedule in the vhost_worker function 2. Change the HZ value from 100 to 1000 3. Reverting 05bfb338fa8d vhost: Fix livepatch timeouts in vhost_worker() 4. Adding a cond_resched to translate_desc 5. Reducing VHOST_NET_WEIGHT to 25% of its original value Please find the diffs below. Summary: Option 1 is very very hacky but resolved the regression. Option 2 reduces the regression by ~20%. Options 3-5 do not help unfortunately. Potential explanation: While the vhost is executing, the need_resched flag is not set (observable in the traces). Therefore cond_resched and alike will do nothing. vhost will continue executing until the need_resched flag is set by an external party, e.g. by a request to migrate the vhost. Calling schedule unconditionally forces the scheduler to re-evaluate all tasks and their vruntime/deadline/vlag values. The scheduler comes to the correct conclusion, that the kworker should be executed and from there it is smooth sailing. I will have to verify that sequence by collecting more traces, but this seems rather plausible. This hack might of course introduce all kinds of side effects but might provide an indicator that this is the actual problem. The big question would be how to solve this conceptually, and, first things first, whether you think this is a viable hypothesis. Increasing the HZ value helps most likely because the other CPUs take scheduling/load balancing decisions more often as well and therefore trigger the migration faster. Bringing down VHOST_NET_WEIGHT even more might also help to shorten the vhost loop. But I have no intuition how low we can/should go here. We also changed vq_err to print error messages, but did not encounter any. Diffs: -------------------------------------------------------------------------- 1. Call uncondtional schedule in the vhost_worker function diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index e0c181ad17e3..16d73fd28831 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -414,6 +414,7 @@ static bool vhost_worker(void *data) } } + schedule(); return !!node; } -------------------------------------------------------------------------- 2. Change the HZ value from 100 to 1000 --> config change -------------------------------------------------------------------------- 3. Reverting 05bfb338fa8d vhost: Fix livepatch timeouts in vhost_worker() diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index e0c181ad17e3..d519d598ebb9 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -410,7 +410,8 @@ static bool vhost_worker(void *data) kcov_remote_start_common(worker->kcov_handle); work->fn(work); kcov_remote_stop(); - cond_resched(); + if (need_resched()) + schedule(); } } -------------------------------------------------------------------------- 4. Adding a cond_resched to translate_desc I just picked some location. diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index e0c181ad17e3..f885dd29cbd1 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2367,6 +2367,7 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, s += size; addr += size; ++ret; + cond_resched(); } if (ret == -EAGAIN) -------------------------------------------------------------------------- 5. Reducing VHOST_NET_WEIGHT to 25% of its original value diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f2ed7167c848..2c6966ea6229 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -42,7 +42,7 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;" /* Max number of bytes transferred before requeueing the job. * Using this limit prevents one virtqueue from starving others. */ -#define VHOST_NET_WEIGHT 0x80000 +#define VHOST_NET_WEIGHT 0x20000 /* Max number of packets transferred before requeueing the job. * Using this limit prevents one virtqueue from starving others with small