Hi Thomas, Ingo, Peter. I'm wondering, was x86/timers branch of tip tree merged to linus' tree for v5.0-rc1? Somehow I do not see this patch make it through... Am I doing something wrong? --nX On Tue, Nov 6, 2018 at 9:58 PM tip-bot for Daniel Vacek <tipbot@xxxxxxxxx> wrote: > > Commit-ID: a786ef152cdcfebc923a67f63c7815806eefcf81 > Gitweb: https://git.kernel.org/tip/a786ef152cdcfebc923a67f63c7815806eefcf81 > Author: Daniel Vacek <neelx@xxxxxxxxxx> > AuthorDate: Mon, 5 Nov 2018 18:10:40 +0100 > Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > CommitDate: Tue, 6 Nov 2018 21:53:15 +0100 > > x86/tsc: Make calibration refinement more robust > > The threshold in tsc_read_refs() is constant which may favor slower CPUs > but may not be optimal for simple reading of reference on faster ones. > > Hence make it proportional to tsc_khz when available to compensate for > this. The threshold guards against any disturbance like IRQs, NMIs, SMIs > or CPU stealing by host on guest systems so rename it accordingly and > fix comments as well. > > Also on some systems there is noticeable DMI bus contention at some point > during boot keeping the readout failing (observed with about one in ~300 > boots when testing). In that case retry also the second readout instead of > simply bailing out unrefined. Usually the next second the readout returns > fast just fine without any issues. > > Signed-off-by: Daniel Vacek <neelx@xxxxxxxxxx> > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Borislav Petkov <bp@xxxxxxxxx> > Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> > Link: https://lkml.kernel.org/r/1541437840-29293-1-git-send-email-neelx@xxxxxxxxxx > > --- > arch/x86/kernel/tsc.c | 30 ++++++++++++++++-------------- > 1 file changed, 16 insertions(+), 14 deletions(-) > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > index e9f777bfed40..3fae23834069 100644 > --- a/arch/x86/kernel/tsc.c > +++ b/arch/x86/kernel/tsc.c > @@ -297,15 +297,16 @@ static int __init tsc_setup(char *str) > > __setup("tsc=", tsc_setup); > > -#define MAX_RETRIES 5 > -#define SMI_TRESHOLD 50000 > +#define MAX_RETRIES 5 > +#define TSC_DEFAULT_THRESHOLD 0x20000 > > /* > - * Read TSC and the reference counters. Take care of SMI disturbance > + * Read TSC and the reference counters. Take care of any disturbances > */ > static u64 tsc_read_refs(u64 *p, int hpet) > { > u64 t1, t2; > + u64 thresh = tsc_khz ? tsc_khz >> 5 : TSC_DEFAULT_THRESHOLD; > int i; > > for (i = 0; i < MAX_RETRIES; i++) { > @@ -315,7 +316,7 @@ static u64 tsc_read_refs(u64 *p, int hpet) > else > *p = acpi_pm_read_early(); > t2 = get_cycles(); > - if ((t2 - t1) < SMI_TRESHOLD) > + if ((t2 - t1) < thresh) > return t2; > } > return ULLONG_MAX; > @@ -703,15 +704,15 @@ static unsigned long pit_hpet_ptimer_calibrate_cpu(void) > * zero. In each wait loop iteration we read the TSC and check > * the delta to the previous read. We keep track of the min > * and max values of that delta. The delta is mostly defined > - * by the IO time of the PIT access, so we can detect when a > - * SMI/SMM disturbance happened between the two reads. If the > + * by the IO time of the PIT access, so we can detect when > + * any disturbance happened between the two reads. If the > * maximum time is significantly larger than the minimum time, > * then we discard the result and have another try. > * > * 2) Reference counter. If available we use the HPET or the > * PMTIMER as a reference to check the sanity of that value. > * We use separate TSC readouts and check inside of the > - * reference read for a SMI/SMM disturbance. We dicard > + * reference read for any possible disturbance. We dicard > * disturbed values here as well. We do that around the PIT > * calibration delay loop as we have to wait for a certain > * amount of time anyway. > @@ -744,7 +745,7 @@ static unsigned long pit_hpet_ptimer_calibrate_cpu(void) > if (ref1 == ref2) > continue; > > - /* Check, whether the sampling was disturbed by an SMI */ > + /* Check, whether the sampling was disturbed */ > if (tsc1 == ULLONG_MAX || tsc2 == ULLONG_MAX) > continue; > > @@ -1268,7 +1269,7 @@ static DECLARE_DELAYED_WORK(tsc_irqwork, tsc_refine_calibration_work); > */ > static void tsc_refine_calibration_work(struct work_struct *work) > { > - static u64 tsc_start = -1, ref_start; > + static u64 tsc_start = ULLONG_MAX, ref_start; > static int hpet; > u64 tsc_stop, ref_stop, delta; > unsigned long freq; > @@ -1283,14 +1284,15 @@ static void tsc_refine_calibration_work(struct work_struct *work) > * delayed the first time we expire. So set the workqueue > * again once we know timers are working. > */ > - if (tsc_start == -1) { > + if (tsc_start == ULLONG_MAX) { > +restart: > /* > * Only set hpet once, to avoid mixing hardware > * if the hpet becomes enabled later. > */ > hpet = is_hpet_enabled(); > - schedule_delayed_work(&tsc_irqwork, HZ); > tsc_start = tsc_read_refs(&ref_start, hpet); > + schedule_delayed_work(&tsc_irqwork, HZ); > return; > } > > @@ -1300,9 +1302,9 @@ static void tsc_refine_calibration_work(struct work_struct *work) > if (ref_start == ref_stop) > goto out; > > - /* Check, whether the sampling was disturbed by an SMI */ > - if (tsc_start == ULLONG_MAX || tsc_stop == ULLONG_MAX) > - goto out; > + /* Check, whether the sampling was disturbed */ > + if (tsc_stop == ULLONG_MAX) > + goto restart; > > delta = tsc_stop - tsc_start; > delta *= 1000000LL;