Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU

Jacob Shin <jacob.shin@xxxxxxx> · Wed, 19 Dec 2012 18:07:23 -0600

On Wed, Dec 19, 2012 at 03:50:14PM -0800, H. Peter Anvin wrote:
> On 12/19/2012 03:40 PM, Jacob Shin wrote:
> >>
> >>Just make the hole a bit bigger, so it starts at 0xfc00000000, then you
> >>only need one MTRR.  This is the correct BIOS-level fix, and it really
> >>needs to happen.
> >>
> >>Do these systems actually exist in the field or are they engineering
> >>prototypes?  In the latter case, we might be done at that point.
> >
> >Yes, HP is shipping (or will ship soon) such systems.
> >
> 
> Can you get them to fix the BIOS first, or at least ship a BIOS
> update?  Otherwise there will be a probabilistic failure, and it
> sounds like it is your (AMD's) fault.
> 
> >>The other bit is that building the real kernel page tables iteratively
> >>(ignoring the early page tables here) is safer, since the real page
> >>table builder is fully aware of the memory map.  This means any
> >>"spillover" from the early page tables gets minimized to regions where
> >>there are data objects that have to be accessed early.  Since Yinghai
> >>already had iterative page table building working, I don't see any
> >>reason to not use that capability.
> >
> >Yes, I'll test again with latest, but Yinghai's patchset mapping only
> >RAM from top down solved our problem.
> 
> Please don't make me go Steve Ballmer on you.
> 
> We're talking about two different things... the early page tables
> versus the permanent page tables.  The permanent page tables we can
> handle because the page table creation at that point is aware of the
> memory map.

Ah okay,

> 
> The early page tables are what is used before we get to that point.
> Creating them on demand means that if there are no early-needed data
> structures near the hole, there will be no access and everything
> will be okay, but as the early page table creation *is not and
> cannot be* aware of the memory map.  Right now that simply cannot
> happen, because all such data structures are confined to 32-bit
> addresses, however *THAT WILL CHANGE AND WILL CHANGE SOON*, exactly
> because these kinds of large-memory system needs that to happen.
> You may start seeing failures at that time, and there isn't a huge
> lot we can do about it.
> 
> We are trying to discuss mitigation strategies with you, but you
> haven't really given us any useful information, e.g. what happens
> near the various boundaries of the hole, what could trigger
> prefeching into the range, and what it would take to fix the BIOSes.

>From what I remember, accessing memory around the memory hole (not
just the HT hole, but e038000000 ~ 10000000000 on our mentioned system
) generated prefetches because the memory hole was marked as WB in PAT.

I'll take a look at the system again, try the blanket MTRR covering
0xe000000000 ~ 1TB, and talk to our BIOS guys.

> 
> 	-hpa
> 
> -- 
> H. Peter Anvin, Intel Open Source Technology Center
> I work for Intel.  I don't speak on their behalf.
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html