On Tue, Dec 11, 2018 at 10:00:35AM +0100, Steven Miao (Arm Technology China) wrote: > Hi Christopher, > > > -----Original Message----- > > From: Christoffer Dall <christoffer.dall@xxxxxxx> > > Sent: Monday, December 10, 2018 9:19 PM > > To: Steven Miao (Arm Technology China) <Steven.Miao@xxxxxxx> > > Cc: kvmarm@xxxxxxxxxxxxxxxxxxxxx > > Subject: Re: KVM arm realtime performance optimization > > > > On Mon, Dec 10, 2018 at 05:36:09AM +0000, Steven Miao (Arm Technology > > China) wrote: > > > > > > From: kvmarm-bounces@xxxxxxxxxxxxxxxxxxxxx > > > <kvmarm-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Steven Miao (Arm > > > Technology China) > > > Sent: Thursday, December 6, 2018 3:05 PM > > > To: kvmarm@xxxxxxxxxxxxxxxxxxxxx > > > Subject: KVM arm realtime performance optimization > > > > > > Hi Everyone, > > > > > > I' currently testing KVM arm realtime performance on a hikey960 board. > > My test benchmark is cyclictest to measure thread wake up latency both on > > Host linux OS and KVM Guest linux OS. > > > > > > Host OS: > > > > > > hikey960:/mnt/debian/usr/src/linux# cyclictest -p 99 -t 4 -m -n -a > > > 0-3 -l 100000 # /dev/cpu_dma_latency set to 0us > > > WARN: Running on unknown kernel version...YMMV > > > policy: fifo: loadavg: 0.00 0.00 0.00 1/165 3270 > > > > > > T: 0 ( 3266) P:99 I:1000 C: 100000 Min: 4 Act: 15 Avg: 15 Max: 139 > > > T: 1 ( 3267) P:99 I:1500 C: 66736 Min: 4 Act: 15 Avg: 15 Max: 239 > > > T: 2 ( 3268) P:99 I:2000 C: 50051 Min: 4 Act: 19 Avg: 15 Max: 43 > > > T: 3 ( 3269) P:99 I:2500 C: 40039 Min: 5 Act: 15 Avg: 16 Max: 74 > > > > > > Guest OS: > > > root@genericarmv8:~# cyclictest -p 99 -t 4 -m -n -a 0-3 -l 100000 # > > > /dev/cpu_dma_latency set to 0us > > > WARN: Running on unknown kernel version...YMMV > > > policy: fifo: loadavg: 0.13 0.05 0.01 1/70 293 > > > > > > T: 0 ( 290) P:99 I:1000 C: 100000 Min: 7 Act: 44 Avg: 85 Max: 16111 > > > T: 1 ( 291) P:99 I:1500 C: 66665 Min: 7 Act: 81 Avg: 90 Max: 15306 > > > T: 2 ( 292) P:99 I:2000 C: 49995 Min: 7 Act: 88 Avg: 87 Max: 16703 > > > T: 3 ( 293) P:99 I:2500 C: 39992 Min: 8 Act: 72 Avg: 97 Max: 14976 > > > > > > > > > RT performance on KVM guest OS is poor compared to that on host OS. The > > average wake up latency is about 6 - 7 times on Guest OS vs on Host OS. > > > I've tried some configurations to improve RT in KVM, like: > > > 1 Can be combined with CPU isolation > > > 2 Host OS and Guest OS use RT preempt kernel > > > 3 Host CPU avoid frequency change > > > 4 Configure NO_HZ_FULL for Guest OS > > > > > > There could be a little improvement after apply above configuration, but > > the RT performance is still very poor. > > > > > > 5 Guest OS use idle poll instead of WFI to avoid trap and switch out > > > > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > > > index 2dc0f84..53aef78 100644 > > > --- a/arch/arm64/kernel/process.c > > > +++ b/arch/arm64/kernel/process.c > > > @@ -83,7 +83,7 @@ void arch_cpu_idle(void) > > > * tricks > > > */ > > > trace_cpu_idle_rcuidle(1, smp_processor_id()); > > > - cpu_do_idle(); > > > + cpu_relax(); > > > local_irq_enable(); > > > trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); } > > > > > > root@genericarmv8:~# cyclictest -p 99 -t 4 -m -n -l 100000 # > > > /dev/cpu_dma_latency set to 0us > > > WARN: Running on unknown kernel version...YMMV > > > policy: fifo: loadavg: 0.07 0.03 0.00 1/99 328 > > > > > > T: 0 ( 325) P:99 I:1000 C: 100000 Min: 3 Act: 6 Avg: 13 Max: 4999 > > > T: 1 ( 326) P:99 I:1500 C: 66659 Min: 5 Act: 7 Avg: 14 Max: 3449 > > > T: 2 ( 327) P:99 I:2000 C: 49989 Min: 4 Act: 7 Avg: 9 Max: 11471 > > > T: 3 ( 328) P:99 I:2500 C: 39986 Min: 4 Act: 14 Avg: 14 Max: 11253 > > > > > > The method 5 can improve Guest OS RT performance a lot, the average > > thread wake up latency on Guest OS is almost same as its on Host OS, but the > > Max wake up latency is still very poor. > > > > > > Anyone has any idea to improve RT performance on KVM Guest OS? > > Although method 5 can improve RT performance on Guest OS a lot, I think it > > is not good idea. > > > > > This is a known problem and there have been presentations about similar > > problems on x86 in past KVM Forums. > > > > The first thing to do is analyze the critical path that adds latency to a wakeup. > > One way to do that is to instrument the path by adding time counter reads to > > the path and figuring out what takes time. > > > > One thing you can look at is having a configurable grace period in KVM's > > block function before the process actually goes to sleep (and calls > > kvm_vcpu_put) and the host scheduler, and see if that helps anything. > Thanks for your suggestion. I will do some further investigation on it, some arm server partner reported KVM Guest RT latency is a little too big than on x86. > > > > > At the end of the day, virtualization is going to add a lot of latency when you > > have to switch the entire state of your CPU, and in terms of virtual RT, you > > end up with a very high minimal latency. > Got it. Hope some new hardware features like VHE and direct inject VIRQ can improve the latency. Just FYI: Those features are not going to help you for wake-up time latency, at all. Also, I warn against optimizing specifically for cyclictest. Most likely you're using cyclictest as some measure for latency for a particular workload, and you must take that into consideration. For example, if you care about interrupt latency from a device using a directly injected LPI, that is going to look very different from going to sleep and getting a timer interrupt (PPI) waking you up. Thanks, Christoffer _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm