Re: [RFC PATCH] efi/x86: limit EFI old memory map to SGI UV1 machines

Russ Anderson <rja@xxxxxxx> · Sun, 5 Jan 2020 23:01:57 -0600

On Fri, Jan 03, 2020 at 09:14:14AM +0100, Ard Biesheuvel wrote:
> On Fri, 3 Jan 2020 at 00:13, Russ Anderson <rja@xxxxxxx> wrote:
> > been used to work around issues.
> >
> > One was when KASLR was added (as part of the Spectre/Meldown
> > mitigation).  The initial implementation broke with new
> > map so efi=old_map was used as a workaround.  I don't know
> > if this was a distro specific breakage or upstream, but the
> > workaround limited the impact and the breakage was quickly
> > fixed.
> >
> > Another time was the EFI locking issue mentioned earlier
> > in this thread.
> >
> 
> So are you saying the distros updated their kernels which subsequently
> broke your platforms, and you needed to use efi=old_map in production
> to work around this? This sounds like something that should have been
> caught in testing before the release was made.

The Spectre/Meldown change was rushed through, without proper
testing.  The lesson was that even security fixes need full testing.
All involved (that I am aware of) do not want to repeat releasing
code that has not been fully tested.

The EFI locking issue was caught by the HPE BRT (Basic Regression Test)
team, but after it had been released by distros.  It was a small
timing hole that ALMOST always worked, which is why it was not detected
immediately.

> Is there any way you could make one of these systems
> available/accessible for testing new kernels? Also, was the breakage
> related specifically to the use of the UV runtime services?

HPE does have systems at Red Hat and SUSE (part of the distro test
environments), along with internal test systems.  HPE does have access
to pre-release distro (RHEL, SLES, Oracle Linux) kernels, including
nightly development builds.  I have a kernel engineer on-site at Red Hat
with access to RH kernel engineers and git trees.  We do test upstream
kernels and have fixed regressions.  That said, we do have limited
resources (test systems, people, time) and do as much as we can with
what we have, so it is not perfect.  But we try our best to be perfect.

> > Is there a compelling reason to put efi=old_map quirk
> > under CONFIG_X86_UV=y?  The original patch description assumed
> > only old SGI UV1 used efi=old_map, that it had not been
> > used on newer hardware, but that isn't the case.  It has been
> > used on newer currently shipping hardware.  Given that
> > new information is there a compelling reason for the change?
> 
> Every feature like this doubles the size of the validation matrix, and
> so restricting efi=old_map to a single target helps to keep the
> maintenance effort manageable.

Understood.  

Thanks.
-- 
Russ Anderson,  SuperDome Flex Linux Kernel Group Manager
HPE - Hewlett Packard Enterprise (formerly SGI)  rja@xxxxxxx