Hi Thomas, On Fri, Nov 27, 2020 at 12:27:34AM +0100, Thomas Gleixner wrote: > Feng, > > On Thu, Nov 26 2020 at 09:24, Feng Tang wrote: > > On Wed, Nov 25, 2020 at 01:46:23PM +0100, Thomas Gleixner wrote: > >> Now the more interesting question is why this needs to be a PCI quirk in > >> the first place. Can't we just disable the HPET based on family/model > >> quirks? > >> > >> e0748539e3d5 ("x86/intel: Disable HPET on Intel Ice Lake platforms") > >> f8edbde885bb ("x86/intel: Disable HPET on Intel Coffee Lake H platforms") > >> fc5db58539b4 ("x86/quirks: Disable HPET on Intel Coffe Lake platforms") > >> 62187910b0fc ("x86/intel: Add quirk to disable HPET for the Baytrail platform") > > > I added this commit, and I can explain some for Baytrail. There was > > some discussion about the way to disable it: > > https://lore.kernel.org/lkml/20140328073718.GA12762@feng-snb/t/ > > > > It uses PCI ID early quirk in the hope that later Baytrail stepping > > doesn't have the problem. And later on, there was official document > > (section 18.10.1.3 http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/atom-z8000-datasheet-vol-1.pdf) > > stating Baytrail's HPET halts in deep idle. So I think your way of > > using CPUID to disable Baytrail HPET makes more sense. > > Correct. > > >> I might be missing something here, but in general on anything modern > >> HPET is mostly useless. > > > > IIUC, nowdays HPET's main use is as a clocksource watchdog monitor. > > Plus for the TSC refined calibration, where it is really better than > anything else we have _if_ it is functional. > > > And in one debug case, we found it still useful. The debug platform has > > early serial console which prints many messages in early boot phase, > > when the HPET is disabled, the software 'jiffies' clocksource will > > be used as the monitor. Early printk will disable interrupt will > > printing message, and this could be quite long for a slow 115200 > > device, and cause the periodic HW timer interrupt get missed, and > > make the 'jiffies' clocksource not accurate, which will in turn > > judge the TSC clocksrouce inaccurate, and disablt it. (Adding Rui, > > Len for more details) > > Yes, that can happen. But OTOH, we should start to think about the > requirements for using the TSC watchdog. > > I'm inclined to lift that requirement when the CPU has: > > 1) X86_FEATURE_CONSTANT_TSC > 2) X86_FEATURE_NONSTOP_TSC > 3) X86_FEATURE_NONSTOP_TSC_S3 IIUC, this feature exists for several generations of Atom platforms, and it is always coupled with 1) and 2), so it could be skipped for the checking. > 4) X86_FEATURE_TSC_ADJUST > > 5) At max. 4 sockets > > After two decades of horrors we're finally at a point where TSC seems to > be halfways reliable and less abused by BIOS tinkerers. TSC_ADJUST was > really key as we can now detect even small modifications reliably and > the important point is that we can cure them as well (not pretty but > better than all other options). > > The only nasty one in the list above is #5 because AFAIK there is still > no architecural guarantee for TSCs being synchronized on machines with > more than 4 sockets. I might be wrong, but then nobody told me. > > The only reason I hate to disable HPET upfront at least during boot is > that HPET is the best mechanism for the refined TSC calibration. PMTIMER > sucks because it's slow and wraps around pretty quick. > > So we could do the following even on platforms where HPET stops in some > magic PC? state: > > - Register it during early boot as clocksource > > - Prevent the enablement as clockevent and the chardev hpet timer muck > > - Prevent the magic PC? state up to the point where the refined > TSC calibration is finished. > > - Unregister it once the TSC has taken over as system clocksource and > enable the magic PC? state in which HPET gets disfunctional. This looks reasonable to me. I have thought about lowering the hpet rating to lower than PMTIMER, so it still contributes in early boot phase, and fades out after PMTIMER is initialised. Thanks, Feng > Hmm? > > Thanks, > > tglx > >