On Sat, 14 Dec 2019 at 16:21, Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > > HI, > > On 13-12-2019 21:16, Ard Biesheuvel wrote: > > On Fri, 13 Dec 2019 at 21:12, Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > >> > >> Hi, > >> > >> On 12/13/19 9:08 PM, Ard Biesheuvel wrote: > >>> On Fri, 13 Dec 2019 at 20:56, Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > >>>> > >>>> Hi, > >>>> > >>>> On 12/13/19 7:49 PM, Ard Biesheuvel wrote: > >>>>> On Fri, 13 Dec 2019 at 13:29, Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> On 13-12-2019 10:11, Ard Biesheuvel wrote: > >>>>>>> EFI mixed mode is a nice hack, since it allows us to run 64-bit Linux > >>>>>>> on low end x86_64 machines that shipped with 32-bit UEFI as they were > >>>>>>> built to run 32-bit Windows only. > >>>>>>> > >>>>>>> Mixed mode relies on the ability to convert calls made using the > >>>>>>> 64-bit calling convention into calls using the 32-bit one. This > >>>>>>> involves pushing a 32-bit word onto the stack for each argument > >>>>>>> passed in a 64-bit register, relying on the fact that all quantities > >>>>>>> that are the native size or smaller (including pointers) can be safely > >>>>>>> truncated to 32 bits. (In the pointer case, we rely on the fact that > >>>>>>> we are still executing in the firmware context, which uses a 1:1 > >>>>>>> mapping that can only access the lower 4 GB of the address space) > >>>>>>> > >>>>>>> For types that are explicitly 64 bits wide, such as EFI_PHYSICAL_ADDRESS > >>>>>>> or UINT64, this assumption doesn't hold. The correct way to marshall > >>>>>>> such a call would be to push two consecutive 32-bit words onto the > >>>>>>> stack, but given that the naive thunking code has no knowledge > >>>>>>> whatsoever of the prototype of the function it is invoking, all we can > >>>>>>> do is avoid calling such functions altogether. > >>>>>>> > >>>>>>> The FreePages() boot service is affected by this, so we should not call > >>>>>>> that at all in mixed mode. In practice, this doesn't change much, since > >>>>>>> in the past, these calls would have been made with a bogus address, and > >>>>>>> so we were leaking this memory already. Note that the scope of this leak > >>>>>>> is the EFI execution context only, so it makes no difference for Linux. > >>>>>>> > >>>>>>> The other piece of functionality that we need to disable is loading files > >>>>>>> passed via file=xxxx on the command line, given that the Open() method > >>>>>>> takes two UINT64s as well. > >>>>>>> > >>>>>>> Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > >>>>>> > >>>>>> Just ignoring the file= arguments is fine with me, as you say this has > >>>>>> been broken on mixed-mode since forever so likely no-one is using it: > >>>>>> > >>>>>> Acked-by: Hans de Goede <hdegoede@xxxxxxxxxx> > >>>>>> > >>>>> > >>>>> Thanks. > >>>>> > >>>>> Do you have any recommendations on how to test this? Are you using GRUB to boot? > >>>>> > >>>>> I am trying to test the random.c failure using QEMU+OVMF, which > >>>>> implements the EFI_RNG_PROTOCOL on top of virtio-rng-pci, but I cannot > >>>>> reproduce the failure. > >>>> > >>>> I hit the random.c issue when testing a 5.5-rc1 x86_64 kernel on a Bay Trail > >>>> tablet. Almost any Bay Trail hw will come with 32 bit uefi because when Bay > >>>> Trail tablets (and 2-in-1s) first hit the market the 64 bit Windows drivers > >>>> were not ready yet and running 32 bit Windows requires a 32 bit UEFI > >>>> (Bay Trail devices do not have a classic bios mode / CSM). > >>>> > >>>> A popular model example machine of such a setup is The Asus T100TA 2-in-1. > >>>> > >>>> I'm using a standard Fedora install on these machines which goes: > >>>> UEFI -> 32-bit-secureboot-shim -> 32-bit-uefi-grub -> 64 bit kernel > >>>> > >>> > >>> And after applying the fix, do you now get a RNG=0x.... on the line > >>> that has ACPI, SMBIOS etc? > >> > >> No I get: > >> > >> [ 0.000000] efi: ACPI=0x3b71f000 ACPI 2.0=0x3b71f014 ESRT=0x3b6ed000 SMBIOS=0x3baa8310 TPMEventLog=0x37e95010 > >> > >> No RNG there. Note this is on a slightly different Bay Trail device. > >> > > > > It is slightly surprising that this mixed mode bug gets tickled even > > though the protocol in question doesn't even exist. > > As mentioned I was testing on a differtent model mixed mode Bay Trail tablet > as I did not have the tablet where I originally hit this at hand. > > I've just tested this on the tablet where I originally hit this; and the RNG= > bit is there: > > [ 0.000000] efi: ACPI=0x39178000 ACPI 2.0=0x39178014 ESRT=0x3914f000 SMBI > OS=0x39b79290 RNG=0x39b79190 TPMEventLog=0x31847010 > > I've added some debugging pr_efi_error calls and on the tablet I was testing with > yesterday the locate_protocol call indeed fails on that one. > > I've done quick comparison of the FW versions, both use AMI AptIO as a BIOS, > With identical core versions; and on the model *without* the RNG support the TXE FW > version is newer: > > Model without RNG proto support: > TXE FW Version 01.01.00.1115 > > Model with RNG proto support: > TXE FW Version 01.00.04.1090 > > So no idea why some of these devices have RNG support and others do not, > anyways the good news is that between these 2 devices both paths have now > been tested in mixed-mode and with my fix in place both paths work. > Thanks for testing. That explains the boot hang, indeed.