Re: [PATCHv14 5/9] efi: Add unaccepted memory support

"Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> · Fri, 13 Oct 2023 15:33:58 +0300

On Tue, Oct 10, 2023 at 04:05:18PM -0500, Michael Roth wrote:
> On Tue, Jun 06, 2023 at 05:26:33PM +0300, Kirill A. Shutemov wrote:
> > efi_config_parse_tables() reserves memory that holds unaccepted memory
> > configuration table so it won't be reused by page allocator.
> > 
> > Core-mm requires few helpers to support unaccepted memory:
> > 
> >  - accept_memory() checks the range of addresses against the bitmap and
> >    accept memory if needed.
> > 
> >  - range_contains_unaccepted_memory() checks if anything within the
> >    range requires acceptance.
> > 
> > Architectural code has to provide efi_get_unaccepted_table() that
> > returns pointer to the unaccepted memory configuration table.
> > 
> > arch_accept_memory() handles arch-specific part of memory acceptance.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > Reviewed-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
> > Reviewed-by: Tom Lendacky <thomas.lendacky@xxxxxxx>
> > ---
> >  arch/x86/platform/efi/efi.c              |   3 +
> >  drivers/firmware/efi/Makefile            |   1 +
> >  drivers/firmware/efi/efi.c               |  25 +++++
> >  drivers/firmware/efi/unaccepted_memory.c | 112 +++++++++++++++++++++++
> >  include/linux/efi.h                      |   1 +
> >  5 files changed, 142 insertions(+)
> >  create mode 100644 drivers/firmware/efi/unaccepted_memory.c
> > 
> > diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c
> > new file mode 100644
> > index 000000000000..08a9a843550a
> > --- /dev/null
> > +++ b/drivers/firmware/efi/unaccepted_memory.c
> > @@ -0,0 +1,112 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +
> > +#include <linux/efi.h>
> > +#include <linux/memblock.h>
> > +#include <linux/spinlock.h>
> > +#include <asm/unaccepted_memory.h>
> > +
> > +/* Protects unaccepted memory bitmap */
> > +static DEFINE_SPINLOCK(unaccepted_memory_lock);
> > +
> > +/*
> > + * accept_memory() -- Consult bitmap and accept the memory if needed.
> > + *
> > + * Only memory that is explicitly marked as unaccepted in the bitmap requires
> > + * an action. All the remaining memory is implicitly accepted and doesn't need
> > + * acceptance.
> > + *
> > + * No need to accept:
> > + *  - anything if the system has no unaccepted table;
> > + *  - memory that is below phys_base;
> > + *  - memory that is above the memory that addressable by the bitmap;
> > + */
> > +void accept_memory(phys_addr_t start, phys_addr_t end)
> > +{
> > +	struct efi_unaccepted_memory *unaccepted;
> > +	unsigned long range_start, range_end;
> > +	unsigned long flags;
> > +	u64 unit_size;
> > +
> > +	unaccepted = efi_get_unaccepted_table();
> > +	if (!unaccepted)
> > +		return;
> > +
> > +	unit_size = unaccepted->unit_size;
> > +
> > +	/*
> > +	 * Only care for the part of the range that is represented
> > +	 * in the bitmap.
> > +	 */
> > +	if (start < unaccepted->phys_base)
> > +		start = unaccepted->phys_base;
> > +	if (end < unaccepted->phys_base)
> > +		return;
> > +
> > +	/* Translate to offsets from the beginning of the bitmap */
> > +	start -= unaccepted->phys_base;
> > +	end -= unaccepted->phys_base;
> > +
> > +	/* Make sure not to overrun the bitmap */
> > +	if (end > unaccepted->size * unit_size * BITS_PER_BYTE)
> > +		end = unaccepted->size * unit_size * BITS_PER_BYTE;
> > +
> > +	range_start = start / unit_size;
> > +
> > +	spin_lock_irqsave(&unaccepted_memory_lock, flags);
> > +	for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap,
> > +				   DIV_ROUND_UP(end, unit_size)) {
> > +		unsigned long phys_start, phys_end;
> > +		unsigned long len = range_end - range_start;
> > +
> > +		phys_start = range_start * unit_size + unaccepted->phys_base;
> > +		phys_end = range_end * unit_size + unaccepted->phys_base;
> > +
> > +		arch_accept_memory(phys_start, phys_end);
> > +		bitmap_clear(unaccepted->bitmap, range_start, len);
> > +	}
> > +	spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
> > +}
> 
> While testing SNP guests running today's tip/master (ef19bc9dddc3) I ran
> into what seems to be fairly significant lock contention due to the
> unaccepted_memory_lock spinlock above, which results in a constant stream
> of soft-lockups until the workload gets all its memory accepted/faulted
> in if the guest has around 16+ vCPUs.
> 
> I've included the guest dmesg traces I was seeing below.
> 
> In this case I was running a 32 vCPU guest with 200GB of memory running on
> a 256 thread EPYC (Milan) system, and can trigger the above situation fairly
> reliably by running the following workload in a freshly-booted guests:
> 
>   stress --vm 32 --vm-bytes 5G --vm-keep
> 
> Scaling up the number of stress threads and vCPUs should make it easier
> to reproduce.
> 
> Other than unresponsiveness/lockup messages until the memory is accepted,
> the guest seems to continue running fine, but for large guests where
> unaccepted memory is more likely to be useful, it seems like it could be
> an issue, especially when consider 100+ vCPU guests.

Okay, sorry for delay. It took time to reproduce it with TDX.

I will look what can be done.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov