On 10/13/23 11:22, Kirill A. Shutemov wrote:
On Fri, Oct 13, 2023 at 03:33:58PM +0300, Kirill A. Shutemov wrote:
While testing SNP guests running today's tip/master (ef19bc9dddc3) I ran
into what seems to be fairly significant lock contention due to the
unaccepted_memory_lock spinlock above, which results in a constant stream
of soft-lockups until the workload gets all its memory accepted/faulted
in if the guest has around 16+ vCPUs.
I've included the guest dmesg traces I was seeing below.
In this case I was running a 32 vCPU guest with 200GB of memory running on
a 256 thread EPYC (Milan) system, and can trigger the above situation fairly
reliably by running the following workload in a freshly-booted guests:
stress --vm 32 --vm-bytes 5G --vm-keep
Scaling up the number of stress threads and vCPUs should make it easier
to reproduce.
Other than unresponsiveness/lockup messages until the memory is accepted,
the guest seems to continue running fine, but for large guests where
unaccepted memory is more likely to be useful, it seems like it could be
an issue, especially when consider 100+ vCPU guests.
Okay, sorry for delay. It took time to reproduce it with TDX.
I will look what can be done.
Could you check if the patch below helps?
diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c
index 853f7dc3c21d..591da3f368fa 100644
--- a/drivers/firmware/efi/unaccepted_memory.c
+++ b/drivers/firmware/efi/unaccepted_memory.c
@@ -8,6 +8,14 @@
/* Protects unaccepted memory bitmap */
static DEFINE_SPINLOCK(unaccepted_memory_lock);
+struct accept_range {
+ struct list_head list;
+ unsigned long start;
+ unsigned long end;
+};
+
+static LIST_HEAD(accepting_list);
+
/*
* accept_memory() -- Consult bitmap and accept the memory if needed.
*
@@ -24,6 +32,7 @@ void accept_memory(phys_addr_t start, phys_addr_t end)
{
struct efi_unaccepted_memory *unaccepted;
unsigned long range_start, range_end;
+ struct accept_range range, *entry;
unsigned long flags;
u64 unit_size;
@@ -80,7 +89,25 @@ void accept_memory(phys_addr_t start, phys_addr_t end)
range_start = start / unit_size;
+ range.start = start;
+ range.end = end;
+retry:
spin_lock_irqsave(&unaccepted_memory_lock, flags);
+
+ list_for_each_entry(entry, &accepting_list, list) {
+ if (entry->end < start)
+ continue;
+ if (entry->start > end)
Should this be a >= check since start and end are page aligned values?
+ continue;
+ spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
+
+ /* Somebody else accepting the range */
+ cpu_relax();
+ goto retry;
Could you set some kind of flag here so that ...
+ }
+
... once you get here, that means that area was accepted and removed from
the list, so I think you could just drop the lock and exit now, right?
Because at that point the bitmap will have been updated and you wouldn't
be accepting any memory anyway?
Thanks,
Tom
+ list_add(&range.list, &accepting_list);
+
for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap,
DIV_ROUND_UP(end, unit_size)) {
unsigned long phys_start, phys_end;
@@ -89,9 +116,15 @@ void accept_memory(phys_addr_t start, phys_addr_t end)
phys_start = range_start * unit_size + unaccepted->phys_base;
phys_end = range_end * unit_size + unaccepted->phys_base;
+ spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
+
arch_accept_memory(phys_start, phys_end);
+
+ spin_lock_irqsave(&unaccepted_memory_lock, flags);
bitmap_clear(unaccepted->bitmap, range_start, len);
}
+
+ list_del(&range.list);
spin_unlock_irqrestore(&unaccepted_memory_lock, flags);
}