Re: [PATCH] qemu-kvm: response to SIGUSR1 to start/stop a VCPU (v2)

Avi Kivity <avi@xxxxxxxxxx> · Wed, 24 Nov 2010 16:23:15 +0200

On 11/24/2010 03:58 PM, Anthony Liguori wrote:
On 11/24/2010 02:18 AM, Avi Kivity wrote:
On 11/23/2010 06:49 PM, Anthony Liguori wrote:
qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of 
teaching
them to respond to these signals (which cannot be trapped), use 
SIGUSR1 to
approximate the behavior of SIGSTOP/SIGCONT.

The purpose of this is to implement CPU hard limits using an 
external tool that
watches the CPU consumption and stops the VCPU as appropriate.

This provides a more elegant solution in that it allows the VCPU 
thread to
release qemu_mutex before going to sleep.

This current implementation uses a single signal.  I think this is 
too racey
in the long term so I think we should introduce a second signal.  If 
two signals
get coalesced into one, it could confuse the monitoring tool into 
giving the
VCPU the inverse of it's entitlement.

You can use sigqueue() to send an accompanying value.

I switched to using SIGRTMIN+5 and SIGRTMIN+6.  I think that's a nicer 
solution since it maps to SIGCONT/SIGSTOP.

These may get reordered, need to check the semantics.

It might be better to simply move this logic entirely into QEMU to 
make this
more robust--the question is whether we think this is a good long 
term feature
to carry in QEMU?

I'm more concerned about lock holder preemption, and interaction of 
this mechanism with any kernel solution for LHP.

Can you suggest some scenarios and I'll create some test cases?  I'm 
trying figure out the best way to evaluate this.

Booting 64-vcpu Windows on a 64-cpu host with PLE but without directed 
yield takes longer than forever because PLE detects contention within 
the guest, which under our current PLE implementation (usleep(100)) 
converts guest contention into delays.

(a directed yield implementation would find that all vcpus are runnable, 
yielding optimal results under this test case).

So if you were to test something similar running with a 20% vcpu cap, 
I'm sure you'd run into similar issues.  It may show with fewer vcpus 
(I've only tested 64).

Are you assuming the existence of a directed yield and the specific 
concern is what happens when a directed yield happens after a PLE and 
the target of the yield has been capped?

Yes.  My concern is that we will see the same kind of problems directed 
yield was designed to fix, but without allowing directed yield to fix 
them.  Directed yield was designed to fix lock holder preemption under 
contention, now you're inducing contention but not allowing directed 
yield to work, even when we will have it.

+static __thread int sigusr1_wfd;
+
+static void on_sigusr1(int signo)
+{
+    char ch = 0;
+    if (write(sigusr1_wfd,&ch, 1)<  0) {
+        /* who cares */
+    }
+}

We do have signalfd().

This is actually called from signalfd.  I thought about refactoring 
that loop to handle signals directly but since we do this elsewhere I 
figured I'd keep things consistent.

Ah, yes.

+
+static void sigusr1_read(void *opaque)
+{
+    CPUState *env = opaque;
+    ssize_t len;
+    int caught_signal = 0;
+
+    do {
+        char buffer[256];
+        len = read(env->sigusr1_fd, buffer, sizeof(buffer));
+        caught_signal = 1;
+    } while (len>  0);
+
+    if (caught_signal) {
+        if (env->stopped) {

env->stopped is multiplexed among multiple users, so this interferes 
with vm_stop().

We need to make ->stopped a reference count instead.

Indeed.

We also need to make the global vm_stop() be reference based, since 
there are multiple consumers of that interface.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html