Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> writes: > Michael Kelley <mikelley@xxxxxxxxxxxxx> writes: > >> I talked to KY Srinivasan for any history about TSC page on 32-bit. He said >> there was no technical reason not to implement it, but our focus was always >> 64-bit Linux, so the 32-bit was much less important. Also, on 32-bit Linux, >> the required 64x64 multiply and shift is more complex and takes more >> more cycles (compare 32-bit implementation of mul_u64_u64_shr vs. >> the 64-bit implementation), so the win over a MSR read is less. I >> don't know of any actual measurements being made to compare vs. >> MSR read. > > VMExit is 1000 CPU cycles or so, I would guess that TSC page > calculations are better. Let me try to build 32bit kernel and do some > quick measurements. So I tried and the difference is HUGE. For in-kernel clocksource reads (like sched_clock()), the testing code was: before = rdtsc_ordered(); for (i = 0; i < 1000; i++) (void)read_hv_sched_clock_msr(); after = rdtsc_ordered(); printk("MSR based clocksource: %d cycles\n", ((u32)(after - before))/1000); before = rdtsc_ordered(); for (i = 0; i < 1000; i++) (void)read_hv_sched_clock_tsc(); after = rdtsc_ordered(); printk("TSC page clocksource: %d cycles\n", ((u32)(after - before))/1000); The result (WS2016) is: [ 1.101910] MSR based clocksource: 3361 cycles [ 1.105224] TSC page clocksource: 49 cycles For userspace reads the absolute difference is even bigger as TSC page gives us functional vDSO: Testing code: before = rdtsc(); for (i = 0; i < COUNT; i++) clock_gettime(CLOCK_REALTIME, &tp); after = rdtsc(); printf("%d\n", (after - before)/COUNT); Result: TSC page: # ./gettime_cycles 131 MSR: # ./gettime_cycles 5664 With all that I see no reason for us to not enable TSC page on 32bit, even if the number of users is negligible, this will allow us to get rid of ugly #ifdef CONFIG_HYPERV_TSCPAGE in the code. I'll send a patch for discussion. -- Vitaly