On 28.04.21 10:46, Peter Zijlstra wrote: [..]
The right thing to do here is to analyze the situation and determine why migration_cost needs changing; is that an architectural thing, does s390 benefit from less sticky tasks due to its cache setup (the book caches could be absorbing some of the penalties here for example). Or is it something that's workload related, does KVM intrinsically not care about migrating so much, or is it something else.
So lets focus on the performance issue. One workload where we have seen this is transactional workload that is triggered by external network requests. So every external request triggered a wakup of a guest and a wakeup of a process in the guest. The end result was that KVM was 40% slower than z/VM (in terms of transactions per second) while we had more idle time. With smaller sched_migration_cost_ns (e.g. 100000) KVM was as fast as z/VM. So to me it looks like that the wakeup and reschedule to a free CPU was just not fast enough. It might also depend where I/O interrupts land. Not sure yet.