RE: >> 4.10 has new code which utilizes the TSC_ADJUST MSR. I just built an unpatched linux v4.10 with tglx's TSC improvements - much else improved in this kernel (like iwlwifi) - thanks! I have attached an updated version of the test program which doesn't print the bogus "Nominal TSC Frequency" (the previous version printed it, but equally ignored it). The clock_gettime(CLOCK_MONOTONIC_RAW,&ts) latency has improved by a factor of 2 - it used to be @140ns and is now @ 70ns ! Wow! : $ uname -r 4.10.0 $ ./ttsc1 max_extended_leaf: 80000008 has tsc: 1 constant: 1 Invariant TSC is enabled: Actual TSC freq: 2.893299GHz. ts2 - ts1: 144 ts3 - ts2: 96 ns1: 0.000000588 ns2: 0.000002599 ts3 - ts2: 178 ns1: 0.000000592 ts3 - ts2: 14 ns1: 0.000000577 ts3 - ts2: 14 ns1: 0.000000651 ts3 - ts2: 17 ns1: 0.000000625 ts3 - ts2: 17 ns1: 0.000000677 ts3 - ts2: 17 ns1: 0.000000626 ts3 - ts2: 17 ns1: 0.000000627 ts3 - ts2: 17 ns1: 0.000000627 ts3 - ts2: 18 ns1: 0.000000655 ts3 - ts2: 17 ns1: 0.000000631 t1 - t0: 89067 - ns2: 0.000091411 I think this is because under Linux 4.8, the CPU got a fault every time it read the TSC_ADJUST MSR. But user programs wanting to use the TSC and correlate its value to clock_gettime(CLOCK_MONOTONIC_RAW) values accurately like the above program still have to dig the TSC frequency value out of the kernel with objdump - this was really the point of the bug #194609. I would still like to investigate exporting 'tsc_khz' & 'mult' + 'shift' values via sysfs. Regards, Jason. On 21/02/2017, Jason Vas Dias <jason.vas.dias@xxxxxxxxx> wrote: > Thank You for enlightening me - > > I was just having a hard time believing that Intel would ship a chip > that features a monotonic, fixed frequency timestamp counter > without specifying in either documentation or on-chip or in ACPI what > precisely that hard-wired frequency is, but I now know that to > be the case for the unfortunate i7-4910MQ - I mean, how can the CPU > assert CPUID:80000007[8] ( InvariantTSC ) which it does, which is > difficult to reconcile with the statement in the SDM : > 17.16.4 Invariant Time-Keeping > The invariant TSC is based on the invariant timekeeping hardware > (called Always Running Timer or ART), that runs at the core crystal > clock > frequency. The ratio defined by CPUID leaf 15H expresses the frequency > relationship between the ART hardware and TSC. If CPUID.15H:EBX[31:0] != > 0 > and CPUID.80000007H:EDX[InvariantTSC] = 1, the following linearity > relationship holds between TSC and the ART hardware: > TSC_Value = (ART_Value * CPUID.15H:EBX[31:0] ) > / CPUID.15H:EAX[31:0] + K > Where 'K' is an offset that can be adjusted by a privileged agent*2. > When ART hardware is reset, both invariant TSC and K are also reset. > > So I'm just trying to figure out what CPUID.15H:EBX[31:0] and > CPUID.15H:EAX[31:0] are for my hardware. I assumed (incorrectly) > that > the "Nominal TSC Frequency" formulae in the manul must apply to all > CPUs with InvariantTSC . > > Do I understand correctly , that since I do have InvariantTSC , the > TSC_Value is in fact calculated according to the above formula, but with > a "hidden" ART Value, & Core Crystal Clock frequency & its ratio to > TSC frequency ? > It was obvious this nominal TSC Frequency had nothing to do with the > actual TSC frequency used by Linux, which is 'tsc_khz' . > I guess wishful thinking led me to believe CPUID:15h was actually > supported somehow , because I thought InvariantTSC meant it had ART > hardware . > > I do strongly suggest that Linux exports its calibrated TSC Khz > somewhere to user > space . > > I think the best long-term solution would be to allow programs to > somehow read the TSC without invoking > clock_gettime(CLOCK_MONOTONIC_RAW,&ts), & > having to enter the kernel, which incurs an overhead of > 120ns on my system > . > > > Couldn't linux export its 'tsc_khz' and / or 'clocksource->mult' and > 'clocksource->shift' values to /sysfs somehow ? > > For instance , only if the 'current_clocksource' is 'tsc', then these > values could be exported as: > /sys/devices/system/clocksource/clocksource0/shift > /sys/devices/system/clocksource/clocksource0/mult > /sys/devices/system/clocksource/clocksource0/freq > > So user-space programs could know that the value returned by > clock_gettime(CLOCK_MONOTONIC_RAW) > would be > { .tv_sec = ( ( rdtsc() * mult ) >> shift ) >> 32, > , .tv_nsec = ( ( rdtsc() * mult ) >> shift ) >> &~0U > } > and that represents ticks of period (1.0 / ( freq * 1000 )) S. > > That would save user-space programs from having to know 'tsc_khz' by > parsing the 'Refined TSC' frequency from log files or by examining the > running kernel with objdump to obtain this value & figure out 'mult' & > 'shift' themselves. > > And why not a > /sys/devices/system/clocksource/clocksource0/value > file that actually prints this ( ( rdtsc() * mult ) >> shift ) > expression as a long integer? > And perhaps a > /sys/devices/pnp0/XX\:YY/rtc/rtc0/nanoseconds > file that actually prints out the number of real-time nano-seconds since > the > contents of the existing > /sys/devices/pnp0/XX\:YY/rtc/rtc0/{time,since_epoch} > files using the current TSC value? > To read the rtc0/{date,time} files is already faster than entering the > kernel to call > clock_gettime(CLOCK_REALTIME, &ts) & convert to integer for scripts. > > I will work on developing a patch to this effect if no-one else is. > > Also, am I right in assuming that the maximum granularity of the real-time > clock > on my system is 1/64th of a second ? : > $ cat /sys/devices/pnp0/00\:02/rtc/rtc0/max_user_freq > 64 > This is the maximum granularity that can be stored in CMOS , not > returned by TSC? Couldn't we have something similar that gave an > accurate idea of TSC frequency and the precise formula applied to TSC > value to get clock_gettime > (CLOCK_MONOTONIC_RAW) value ? > > Regards, > Jason > > > This code does produce good timestamps with a latency of @20ns > that correlate well with clock_gettIme(CLOCK_MONOTONIC_RAW,&ts) > values, but it depends on a global variable that is initialized to > the 'tsc_khz' value > computed by running kernel parsed from objdump /proc/kcore output : > > static inline __attribute__((always_inline)) > U64_t > IA64_tsc_now() > { if(!( _ia64_invariant_tsc_enabled > ||(( _cpu0id_fd == -1) && IA64_invariant_tsc_is_enabled(NULL,NULL)) > ) > ) > { fprintf(stderr, __FILE__":%d:(%s): must be called with invariant > TSC enabled.\n"); > return 0; > } > U32_t tsc_hi, tsc_lo; > register UL_t tsc; > asm volatile > ( "rdtscp\n\t" > "mov %%edx, %0\n\t" > "mov %%eax, %1\n\t" > "mov %%ecx, %2\n\t" > : "=m" (tsc_hi) , > "=m" (tsc_lo) , > "=m" (_ia64_tsc_user_cpu) : > : "%eax","%ecx","%edx" > ); > tsc=(((UL_t)tsc_hi) << 32)|((UL_t)tsc_lo); > return tsc; > } > > __thread > U64_t _ia64_first_tsc = 0xffffffffffffffffUL; > > static inline __attribute__((always_inline)) > U64_t IA64_tsc_ticks_since_start() > { if(_ia64_first_tsc == 0xffffffffffffffffUL) > { _ia64_first_tsc = IA64_tsc_now(); > return 0; > } > return (IA64_tsc_now() - _ia64_first_tsc) ; > } > > static inline __attribute__((always_inline)) > void > ia64_tsc_calc_mult_shift > ( register U32_t *mult, > register U32_t *shift > ) > { /* paraphrases Linux clocksource.c's clocks_calc_mult_shift() function: > * calculates second + nanosecond mult + shift in same way linux does. > * we want to be compatible with what linux returns in struct > timespec ts after call to > * clock_gettime(CLOCK_MONOTONIC_RAW, &ts). > */ > const U32_t scale=1000U; > register U32_t from= IA64_tsc_khz(); > register U32_t to = NSEC_PER_SEC / scale; > register U64_t sec = ( ~0UL / from ) / scale; > sec = (sec > 600) ? 600 : ((sec > 0) ? sec : 1); > register U64_t maxsec = sec * scale; > UL_t tmp; > U32_t sft, sftacc=32; > /* > * Calculate the shift factor which is limiting the conversion > * range: > */ > tmp = (maxsec * from) >> 32; > while (tmp) > { tmp >>=1; > sftacc--; > } > /* > * Find the conversion shift/mult pair which has the best > * accuracy and fits the maxsec conversion range: > */ > for (sft = 32; sft > 0; sft--) > { tmp = ((UL_t) to) << sft; > tmp += from / 2; > tmp = tmp / from; > if ((tmp >> sftacc) == 0) > break; > } > *mult = tmp; > *shift = sft; > } > > __thread > U32_t _ia64_tsc_mult = ~0U, _ia64_tsc_shift=~0U; > > static inline __attribute__((always_inline)) > U64_t IA64_s_ns_since_start() > { if( ( _ia64_tsc_mult == ~0U ) || ( _ia64_tsc_shift == ~0U ) ) > ia64_tsc_calc_mult_shift( &_ia64_tsc_mult, &_ia64_tsc_shift); > register U64_t cycles = IA64_tsc_ticks_since_start(); > register U64_t ns = ((cycles *((UL_t)_ia64_tsc_mult))>>_ia64_tsc_shift); > return( (((ns / NSEC_PER_SEC)&0xffffffffUL) << 32) | ((ns % > NSEC_PER_SEC)&0x3fffffffUL) ); > /* Yes, we are purposefully ignoring durations of more than 4.2 > billion seconds here! */ > } > > > I think Linux should export the 'tsc_khz', 'mult' and 'shift' values > somehow, > then user-space libraries could have more confidence in using 'rdtsc' > or 'rdtscp' > if Linux's current_clocksource is 'tsc'. > > Regards, > Jason > > > > On 20/02/2017, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: >> On Sun, 19 Feb 2017, Jason Vas Dias wrote: >> >>> CPUID:15H is available in user-space, returning the integers : ( 7, >>> 832, 832 ) in EAX:EBX:ECX , yet boot_cpu_data.cpuid_level is 13 , so >>> in detect_art() in tsc.c, >> >> By some definition of available. You can feed CPUID random leaf numbers >> and >> it will return something, usually the value of the last valid CPUID leaf, >> which is 13 on your CPU. A similar CPU model has >> >> 0x0000000d 0x00: eax=0x00000007 ebx=0x00000340 ecx=0x00000340 >> edx=0x00000000 >> >> i.e. 7, 832, 832, 0 >> >> Looks familiar, right? >> >> You can verify that with 'cpuid -1 -r' on your machine. >> >>> Linux does not think ART is enabled, and does not set the synthesized >>> CPUID + >>> ((3*32)+10) bit, so a program looking at /dev/cpu/0/cpuid would not >>> see this bit set . >> >> Rightfully so. This is a Haswell Core model. >> >>> if an e1000 NIC card had been installed, PTP would not be available. >> >> PTP is independent of the ART kernel feature . ART just provides enhanced >> PTP features. You are confusing things here. >> >> The ART feature as the kernel sees it is a hardware extension which feeds >> the ART clock to peripherals for timestamping and time correlation >> purposes. The ratio between ART and TSC is described by CPUID leaf 0x15 >> so >> the kernel can make use of that correlation, e.g. for enhanced PTP >> accuracy. >> >> It's correct, that the NONSTOP_TSC feature depends on the availability of >> ART, but that has nothing to do with the feature bit, which solely >> describes the ratio between TSC and the ART frequency which is exposed to >> peripherals. That frequency is not necessarily the real ART frequency. >> >>> Also, if the MSR TSC_ADJUST has not yet been written, as it seems to be >>> nowhere else in Linux, the code will always think X86_FEATURE_ART is 0 >>> because the CPU will always get a fault reading the MSR since it has >>> never been written. >> >> Huch? If an access to the TSC ADJUST MSR faults, then something is really >> wrong. And writing it unconditionally to 0 is not going to happen. 4.10 >> has >> new code which utilizes the TSC_ADJUST MSR. >> >>> It would be nice for user-space programs that want to use the TSC with >>> rdtsc / rdtscp instructions, such as the demo program attached to the >>> bug report, >>> could have confidence that Linux is actually generating the results of >>> clock_gettime(CLOCK_MONOTONIC_RAW, ×pec) >>> in a predictable way from the TSC by looking at the >>> /dev/cpu/0/cpuid[bit(((3*32)+10)] value before enabling user-space >>> use of TSC values, so that they can correlate TSC values with linux >>> clock_gettime() values. >> >> What has ART to do with correct CLOCK_MONOTONIC_RAW values? >> >> Nothing at all, really. >> >> The kernel makes use of the proper information values already. >> >> The TSC frequency is determined from: >> >> 1) CPUID(0x16) if available >> 2) MSRs if available >> 3) By calibration against a known clock >> >> If the kernel uses TSC as clocksource then the CLOCK_MONOTONIC_* values >> are >> correct whether that machine has ART exposed to peripherals or not. >> >>> has tsc: 1 constant: 1 >>> 832 / 7 = 118 : 832 - 9.888914286E+04hz : OK:1 >> >> And that voodoo math tells us what? That you found a way to correlate >> CPUID(0xd) to the TSC frequency on that machine. >> >> Now I'm curious how you do that on this other machine which returns for >> cpuid(15): 1, 1, 1 >> >> You can't because all of this is completely wrong. >> >> Thanks, >> >> tglx >> >
Attachment:
ttsc.tar
Description: Unix tar archive