Re: New EFI thunk alignment WARN_ON in 5.6 triggers multiple times

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 2/12/20 12:53 PM, Ard Biesheuvel wrote:
On Wed, 12 Feb 2020 at 12:44, Hans de Goede <hdegoede@xxxxxxxxxx> wrote:

Hi Ard,

While booting 5.6-rc1 on one of my test machines I noticed the WARN_ON
on line 198 of arch/x86/platform/efi/efi_64.c trigger many times.

I've done some debugging on this an this is caused by the following
call path:

drivers/firmware/efi/vars.c: efivar_init():

          unsigned long variable_name_size = 1024;
          efi_char16_t *variable_name;
          efi_guid_t vendor_guid;

          variable_name = kzalloc(variable_name_size, GFP_KERNEL);
          if (!variable_name) {
                  printk(KERN_ERR "efivars: Memory allocation failed.\n");
                  return -ENOMEM;
          }

         ...


          do {
                  variable_name_size = 1024;

                  status = ops->get_next_variable(&variable_name_size,
                                                  variable_name,
                                                  &vendor_guid);
         ...

arch/x86/platform/efi/efi_64.c: efi_thunk_get_next_variable()

         ...
          phys_vendor = virt_to_phys_or_null(vendor);
         ...

arch/x86/platform/efi/efi_64.c: virt_to_phys_or_null_size()

         ...
         WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size);
         ...

Specifically the problem is that the efi_guid_t vendor_guid has an 8 bytes
aligned address and the WARN_ON checks for it being aligned to\
sizeof(efi_guid_t) which is 16 bytes.

I've fixed this for now with the following local fix, but I'm not sure
what the alignment rules actually are so I'm not sure this is correct:

--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -181,6 +181,7 @@ static inline phys_addr_t
   virt_to_phys_or_null_size(void *va, unsigned long size)
   {
         bool bad_size;
+       int alignment;

         if (!va)
                 return 0;
@@ -195,7 +196,8 @@ virt_to_phys_or_null_size(void *va, unsigned long size)
          */
         bad_size = size > PAGE_SIZE || !is_power_of_2(size);

-       WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size);
+       alignment = size > 8 ? 8 : size;
+       WARN_ON(!IS_ALIGNED((unsigned long)va, alignment) || bad_size);

         return slow_virt_to_phys(va);
   }


I have a feeling that this is the right thing to do, but as said I'm not 100%
sure. If you can confirm that this is the right fix, then I can submit this
upstream.



It seems that the purpose of the alignment check is to ensure that the
data does not cross a page boundary, so that the data is guaranteed to
be contiguous in the physical address space as well.

So in that light, the fix is most definitely wrong, although I am not
sure how this is supposed to work generally.

I'm not sure that is what it is trying to check, if that is what it is
trying to check then the code is certainly wrong.

Let me first quote the entire check:

        /*
         * A fully aligned variable on the stack is guaranteed not to
         * cross a page bounary. Try to catch strings on the stack by
         * checking that 'size' is a power of two.
         */
        bad_size = size > PAGE_SIZE || !is_power_of_2(size);

        WARN_ON(!IS_ALIGNED((unsigned long)va, size) || bad_size);

AFAIK EFI is using the identity map, and the kernel stack is
physically contiguous, so crossing a page boundary should not a
problem. Also notice how the bad_size thing is talking about
page boundary crossing, but the thing triggering is the
IS_ALIGNED check. AFAIK there is no requirement for a struct, e.g.
an UUID (which is the problem here) to be aligned to its size,
it just needs to be 8 byte / 64 bit aligned, which it is yet
the IS_ALIGNED check is failing because it is checking for
a larger, in this case twice as large, but possibly it will
end up checking for a much larger alignment.

As said I'm not exactly familiar with the alignment requirements
but the current check certainly seems wrong.

It seems that we have had this around for a while, it stems from:

###
From f6697df36bdf0bf7fce984605c2918d4a7b4269f Mon Sep 17 00:00:00 2001
From: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx>
Date: Sat, 12 Nov 2016 21:04:24 +0000
Subject: [PATCH v3] x86/efi: Prevent mixed mode boot corruption with
 CONFIG_VMAP_STACK=y

Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497d5 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL=y immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Signed-off-by: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx>
###

But I guess the recent changes are triggering the alignment check now.

Alternatively we could make the GUID kalloc-ed instead of having it
on the stack, but that seems unnecessary.

Andy, Matt, do you have any input on this?

Regards,

Hans





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux