On Fri, Sep 22, 2017 at 09:23:47AM +0200, Paolo Bonzini wrote: > On 22/09/2017 03:08, Marcelo Tosatti wrote: > > On Thu, Sep 21, 2017 at 03:49:33PM +0200, Paolo Bonzini wrote: > >> On 21/09/2017 15:32, Konrad Rzeszutek Wilk wrote: > >>> So the guest can change the scheduling decisions at the host level? > >>> And the host HAS to follow it? There is no policy override for the > >>> host to say - nah, not going to do it? > > > > In that case the host should not even configure the guest with this > > option (this is QEMU's 'enable-rt-fifo-hc' option). > > > >>> Also wouldn't the guest want to always be at SCHED_FIFO? [I am thinking > >>> of a guest admin who wants all the CPU resources he can get] > > > > No. Because in the following code, executed by the housekeeping vCPU > > running at constant SCHED_FIFO priority: > > > > 1. Start disk I/O. > > 2. busy spin > > > > With the emulator thread sharing the same pCPU with the housekeeping > > vCPU, the emulator thread (which runs at SCHED_NORMAL), will never > > be scheduled in in place of the vcpu thread at SCHED_FIFO. > > > > This causes a hang. > > But if the emulator thread can interrupt the housekeeping thread, the > emulator thread should also be SCHED_FIFO at higher priority; IIRC this > was in Jan's talk from a few years ago. The point is we do not want the emulator thread to interrupt the housekeeping thread at all times: we only want it to interrupt the housekeeping thread when it is not in a spinlock protected section (because that has an effect on realtime vcpu's attempting to grab that particular spinlock). Otherwise, it can interrupt the housekeeping thread. > QEMU would also have to use PI mutexes (which is the main reason why > it's using QemuMutex instead of e.g. GMutex). > > >> Yeah, I do not understand why there should be a housekeeping VCPU that > >> is running at SCHED_NORMAL. If it hurts, don't do it... > > > > Hope explanation above makes sense (in fact, it was you who pointed > > out SCHED_FIFO should not be constant on the housekeeping vCPU, > > when sharing pCPU with emulator thread at SCHED_NORMAL). > > The two are not exclusive... As you point out, it depends on the > workload. For DPDK you can put both of them at SCHED_NORMAL. For > kernel-intensive uses you must use SCHED_FIFO. > > Perhaps we could consider running these threads at SCHED_RR instead. > Unlike SCHED_NORMAL, I am not against a hypercall that bumps temporarily > SCHED_RR to SCHED_FIFO, but perhaps that's not even necessary. Sorry Paolo, i don't see how SCHED_RR is going to help here: " SCHED_RR: Round-robin scheduling SCHED_RR is a simple enhancement of SCHED_FIFO. Everything described above for SCHED_FIFO also applies to SCHED_RR, except that each thread is allowed to run only for a maximum time quantum." What must happen is that vcpu0 should run _until its finished with spinlock protected section_ (that is, any job the emulator thread has, in that period where vcpu0 has work to do, is of less priority and must not execute). Otherwise vcpu1, running a realtime workload, will attempt to grab the spinlock vcpu0 has grabbed, and busy spin waiting on the emulator thread to finish. If you have the emulator thread at a higher priority than vcpu0, as you suggested above, the same problem will happen. So that option is not viable. We tried to have vcpu0 with SCHED_FIFO at all times, to avoid this hypercall, but unfortunately that'll cause the hang as described in the trace. So i fail to see how SCHED_RR should help here? Thanks