On Tue, Sep 14, 2010 at 4:10 PM, tip-bot for Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > Commit-ID: 54ff7e595d763d894104d421b103a89f7becf47c > Gitweb: http://git.kernel.org/tip/54ff7e595d763d894104d421b103a89f7becf47c > Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > AuthorDate: Tue, 14 Sep 2010 22:10:21 +0200 > Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > CommitDate: Wed, 15 Sep 2010 00:55:13 +0200 > > x86: hpet: Work around hardware stupidity > > This more or less reverts commits 08be979 (x86: Force HPET > readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict > read back to affected ATI chipsets) to the status of commit 8da854c > (x86, hpet: Erratum workaround for read after write of HPET > comparator). > > The delta to commit 8da854c is mostly comments and the change from > WARN_ONCE to printk_once as we know the call path of this function > already. > > This needs really in depth explanation: > > First of all the HPET design is a complete failure. Having a counter > compare register which generates an interrupt on matching values > forces the software to do at least one superfluous readback of the > counter register. > > While it is nice in theory to program "absolute" time events it is > practically useless because the timer runs at some absurd frequency > which can never be matched to real world units. So we are forced to > calculate a relative delta and this forces a readout of the actual > counter value, adding the delta and programming the compare > register. When the delta is small enough we run into the danger that > we program a compare value which is already in the past. Due to the > compare for equal nature of HPET we need to read back the counter > value after writing the compare rehgister (btw. this is necessary for > absolute timeouts as well) to make sure that we did not miss the timer > event. We try to work around that by setting the minimum delta to a > value which is larger than the theoretical time which elapses between > the counter readout and the compare register write, but that's only > true in theory. A NMI or SMI which hits between the readout and the > write can easily push us beyond that limit. This would result in > waiting for the next HPET timer interrupt until the 32bit wraparound > of the counter happens which takes about 306 seconds. > > So we designed the next event function to look like: > > match = read_cnt() + delta; > write_compare_ref(match); > return read_cnt() < match ? 0 : -ETIME; > > At some point we got into trouble with certain ATI chipsets. Even the > above "safe" procedure failed. The reason was that the write to the > compare register was delayed probably for performance reasons. The > theory was that they wanted to avoid the synchronization of the write > with the HPET clock, which is understandable. So the write does not > hit the compare register directly instead it goes to some intermediate > register which is copied to the real compare register in sync with the > HPET clock. That opens another window for hitting the dreaded "wait > for a wraparound" problem. > > To work around that "optimization" we added a read back of the compare > register which either enforced the update of the just written value or > just delayed the readout of the counter enough to avoid the issue. We > unfortunately never got any affirmative info from ATI/AMD about this. > > One thing is sure, that we nuked the performance "optimization" that > way completely and I'm pretty sure that the result is worse than > before some HW folks came up with those. > > Just for paranoia reasons I added a check whether the read back > compare register value was the same as the value we wrote right > before. That paranoia check triggered a couple of years after it was > added on an Intel ICH9 chipset. Venki added a workaround (commit > 8da854c) which was reading the compare register twice when the first > check failed. We considered this to be a penalty in general and > restricted the readback (thus the wasted CPU cycles) to the known to > be affected ATI chipsets. > > This turned out to be a utterly wrong decision. 2.6.35 testers > experienced massive problems and finally one of them bisected it down > to commit 30a564be which spured some further investigation. > > Finally we got confirmation that the write to the compare register can > be delayed by up to two HPET clock cycles which explains the problems > nicely. All we can do about this is to go back to Venki's initial > workaround in a slightly modified version. > > Just for the record I need to say, that all of this could have been > avoided if hardware designers and of course the HPET committee would > have thought about the consequences for a split second. It's out of my > comprehension why designing a working timer is so hard. There are two > ways to achieve it: > > 1) Use a counter wrap around aware compare_reg <= counter_reg > implementation instead of the easy compare_reg == counter_reg > > Downsides: > > - It needs more silicon. > > - It needs a readout of the counter to apply a relative > timeout. This is necessary as the counter does not run in > any useful (and adjustable) frequency and there is no > guarantee that the counter which is used for timer events is > the same which is used for reading the actual time (and > therefor for calculating the delta) > > Upsides: > > - None > > 2) Use a simple down counter for relative timer events > > Downsides: > > - Absolute timeouts are not possible, which is not a problem > at all in the context of an OS and the expected > max. latencies/jitter (also see Downsides of #1) > > Upsides: > > - It needs less or equal silicon. > > - It works ALWAYS > > - It is way faster than a compare register based solution (One > write versus one write plus at least one and up to four > reads) > > I would not be so grumpy about all of this, if I would not have been > ignored for many years when pointing out these flaws to various > hardware folks. I really hate timers (at least those which seem to be > designed by janitors). > > Though finally we got a reasonable explanation plus a solution and I > want to thank all the folks involved in chasing it down and providing > valuable input to this. > > Bisected-by: Nix <nix@xxxxxxxxxxxxx> > Reported-by: Artur Skawina <art.08.09@xxxxxxxxx> > Reported-by: Damien Wyart <damien.wyart@xxxxxxx> > Reported-by: John Drescher <drescherjm@xxxxxxxxx> > Cc: Venkatesh Pallipadi <venki@xxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxx> > Cc: H. Peter Anvin <hpa@xxxxxxxxx> > Cc: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx> > Cc: Andreas Herrmann <andreas.herrmann3@xxxxxxx> > Cc: Borislav Petkov <borislav.petkov@xxxxxxx> > Cc: stable@xxxxxxxxxx > Acked-by: Suresh Siddha <suresh.b.siddha@xxxxxxxxx> > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > --- > arch/x86/include/asm/hpet.h | 1 - > arch/x86/kernel/early-quirks.c | 18 ------------------ > arch/x86/kernel/hpet.c | 31 +++++++++++++++++-------------- > 3 files changed, 17 insertions(+), 33 deletions(-) > > diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h > index 004e6e2..1d5c08a 100644 > --- a/arch/x86/include/asm/hpet.h > +++ b/arch/x86/include/asm/hpet.h > @@ -68,7 +68,6 @@ extern unsigned long force_hpet_address; > extern u8 hpet_blockid; > extern int hpet_force_user; > extern u8 hpet_msi_disable; > -extern u8 hpet_readback_cmp; > extern int is_hpet_enabled(void); > extern int hpet_enable(void); > extern void hpet_disable(void); > diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c > index e5cc7e8..ebdb85c 100644 > --- a/arch/x86/kernel/early-quirks.c > +++ b/arch/x86/kernel/early-quirks.c > @@ -18,7 +18,6 @@ > #include <asm/apic.h> > #include <asm/iommu.h> > #include <asm/gart.h> > -#include <asm/hpet.h> > > static void __init fix_hypertransport_config(int num, int slot, int func) > { > @@ -192,21 +191,6 @@ static void __init ati_bugs_contd(int num, int slot, int func) > } > #endif > > -/* > - * Force the read back of the CMP register in hpet_next_event() > - * to work around the problem that the CMP register write seems to be > - * delayed. See hpet_next_event() for details. > - * > - * We do this on all SMBUS incarnations for now until we have more > - * information about the affected chipsets. > - */ > -static void __init ati_hpet_bugs(int num, int slot, int func) > -{ > -#ifdef CONFIG_HPET_TIMER > - hpet_readback_cmp = 1; > -#endif > -} > - > #define QFLAG_APPLY_ONCE 0x1 > #define QFLAG_APPLIED 0x2 > #define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED) > @@ -236,8 +220,6 @@ static struct chipset early_qrk[] __initdata = { > PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs }, > { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS, > PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd }, > - { PCI_VENDOR_ID_ATI, PCI_ANY_ID, > - PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_hpet_bugs }, > {} > }; > > diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c > index 351f9c0..410fdb3 100644 > --- a/arch/x86/kernel/hpet.c > +++ b/arch/x86/kernel/hpet.c > @@ -35,7 +35,6 @@ > unsigned long hpet_address; > u8 hpet_blockid; /* OS timer block num */ > u8 hpet_msi_disable; > -u8 hpet_readback_cmp; > > #ifdef CONFIG_PCI_MSI > static unsigned long hpet_num_timers; > @@ -395,23 +394,27 @@ static int hpet_next_event(unsigned long delta, > * at that point and we would wait for the next hpet interrupt > * forever. We found out that reading the CMP register back > * forces the transfer so we can rely on the comparison with > - * the counter register below. > + * the counter register below. If the read back from the > + * compare register does not match the value we programmed > + * then we might have a real hardware problem. We can not do > + * much about it here, but at least alert the user/admin with > + * a prominent warning. > * > - * That works fine on those ATI chipsets, but on newer Intel > - * chipsets (ICH9...) this triggers due to an erratum: Reading > - * the comparator immediately following a write is returning > - * the old value. > + * An erratum on some chipsets (ICH9,..), results in > + * comparator read immediately following a write returning old > + * value. Workaround for this is to read this value second > + * time, when first read returns old value. > * > - * We restrict the read back to the affected ATI chipsets (set > - * by quirks) and also run it with hpet=verbose for debugging > - * purposes. > + * In fact the write to the comparator register is delayed up > + * to two HPET cycles so the workaround we tried to restrict > + * the readback to those known to be borked ATI chipsets > + * failed miserably. So we give up on optimizations forever > + * and penalize all HPET incarnations unconditionally. > */ > - if (hpet_readback_cmp || hpet_verbose) { > - u32 cmp = hpet_readl(HPET_Tn_CMP(timer)); > - > - if (cmp != cnt) > + if (unlikely((u32)hpet_readl(HPET_Tn_CMP(timer)) != cnt)) { > + if (hpet_readl(HPET_Tn_CMP(timer)) != cnt) Minor nit. I guess (u32) in first check above is not needed as hpet_readl actually returns unsigned int. Otherwise Acked-by: Venkatesh Pallipadi <venki@xxxxxxxxxx> > printk_once(KERN_WARNING > - "hpet: compare register read back failed.\n"); > + "hpet: compare register read back failed.\n"); > } > > return (s32)(hpet_readl(HPET_COUNTER) - cnt) >= 0 ? -ETIME : 0; > -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |