On 22/09/2017 03:08, Marcelo Tosatti wrote: > On Thu, Sep 21, 2017 at 03:49:33PM +0200, Paolo Bonzini wrote: >> On 21/09/2017 15:32, Konrad Rzeszutek Wilk wrote: >>> So the guest can change the scheduling decisions at the host level? >>> And the host HAS to follow it? There is no policy override for the >>> host to say - nah, not going to do it? > > In that case the host should not even configure the guest with this > option (this is QEMU's 'enable-rt-fifo-hc' option). > >>> Also wouldn't the guest want to always be at SCHED_FIFO? [I am thinking >>> of a guest admin who wants all the CPU resources he can get] > > No. Because in the following code, executed by the housekeeping vCPU > running at constant SCHED_FIFO priority: > > 1. Start disk I/O. > 2. busy spin > > With the emulator thread sharing the same pCPU with the housekeeping > vCPU, the emulator thread (which runs at SCHED_NORMAL), will never > be scheduled in in place of the vcpu thread at SCHED_FIFO. > > This causes a hang. But if the emulator thread can interrupt the housekeeping thread, the emulator thread should also be SCHED_FIFO at higher priority; IIRC this was in Jan's talk from a few years ago. QEMU would also have to use PI mutexes (which is the main reason why it's using QemuMutex instead of e.g. GMutex). >> Yeah, I do not understand why there should be a housekeeping VCPU that >> is running at SCHED_NORMAL. If it hurts, don't do it... > > Hope explanation above makes sense (in fact, it was you who pointed > out SCHED_FIFO should not be constant on the housekeeping vCPU, > when sharing pCPU with emulator thread at SCHED_NORMAL). The two are not exclusive... As you point out, it depends on the workload. For DPDK you can put both of them at SCHED_NORMAL. For kernel-intensive uses you must use SCHED_FIFO. Perhaps we could consider running these threads at SCHED_RR instead. Unlike SCHED_NORMAL, I am not against a hypercall that bumps temporarily SCHED_RR to SCHED_FIFO, but perhaps that's not even necessary. Paolo