On 10/14/23 22:40, Kirill A. Shutemov wrote: > Michael reported soft lockups on a system that has unaccepted memory. > This occurs when a user attempts to allocate and accept memory on > multiple CPUs simultaneously. > > The root cause of the issue is that memory acceptance is serialized with > a spinlock, allowing only one CPU to accept memory at a time. The other > CPUs spin and wait for their turn, leading to starvation and soft lockup > reports. > > To address this, the code has been modified to release the spinlock > while accepting memory. This allows for parallel memory acceptance on > multiple CPUs. > > A newly introduced "accepting_list" keeps track of which memory is > currently being accepted. This is necessary to prevent parallel > acceptance of the same memory block. If a collision occurs, the lock is > released and the process is retried. > > Such collisions should rarely occur. The main path for memory acceptance > is the page allocator, which accepts memory in MAX_ORDER chunks. As long > as MAX_ORDER is equal to or larger than the unit_size, collisions will > never occur because the caller fully owns the memory block being > accepted. > > Aside from the page allocator, only memblock and deferered_free_range() > accept memory, but this only happens during boot. > > The code has been tested with unit_size == 128MiB to trigger collisions > and validate the retry codepath. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > Reported-by: Michael Roth <michael.roth@xxxxxxx > Fixes: 2053bc57f367 ("efi: Add unaccepted memory support") > Cc: <stable@xxxxxxxxxx> > --- > drivers/firmware/efi/unaccepted_memory.c | 55 ++++++++++++++++++++++-- > 1 file changed, 51 insertions(+), 4 deletions(-) > > diff --git a/drivers/firmware/efi/unaccepted_memory.c b/drivers/firmware/efi/unaccepted_memory.c > index 853f7dc3c21d..8af0306c8e5c 100644 > --- a/drivers/firmware/efi/unaccepted_memory.c > +++ b/drivers/firmware/efi/unaccepted_memory.c > @@ -5,9 +5,17 @@ > #include <linux/spinlock.h> > #include <asm/unaccepted_memory.h> > > -/* Protects unaccepted memory bitmap */ > +/* Protects unaccepted memory bitmap and accepting_list */ > static DEFINE_SPINLOCK(unaccepted_memory_lock); > > +struct accept_range { > + struct list_head list; > + unsigned long start; > + unsigned long end; > +}; > + > +static LIST_HEAD(accepting_list); > + > /* > * accept_memory() -- Consult bitmap and accept the memory if needed. > * > @@ -24,6 +32,7 @@ void accept_memory(phys_addr_t start, phys_addr_t end) > { > struct efi_unaccepted_memory *unaccepted; > unsigned long range_start, range_end; > + struct accept_range range, *entry; > unsigned long flags; > u64 unit_size; > > @@ -78,20 +87,58 @@ void accept_memory(phys_addr_t start, phys_addr_t end) > if (end > unaccepted->size * unit_size * BITS_PER_BYTE) > end = unaccepted->size * unit_size * BITS_PER_BYTE; > > - range_start = start / unit_size; > - > + range.start = start / unit_size; > + range.end = DIV_ROUND_UP(end, unit_size); > +retry: > spin_lock_irqsave(&unaccepted_memory_lock, flags); > + > + /* > + * Check if anybody works on accepting the same range of the memory. > + * > + * The check with unit_size granularity. It is crucial to catch all "The check is done ..." ? > + * accept requests to the same unit_size block, even if they don't > + * overlap on physical address level. > + */ > + list_for_each_entry(entry, &accepting_list, list) { > + if (entry->end < range.start) > + continue; > + if (entry->start >= range.end) > + continue; Hmm we really don't have a macro for ranges_intersect()? Given how easy is to make a mistake. I found only zone_intersects(). > + > + /* > + * Somebody else accepting the range. Or at least part of it. > + * > + * Drop the lock and retry until it is complete. > + */ > + spin_unlock_irqrestore(&unaccepted_memory_lock, flags); > + cond_resched(); > + goto retry; > + } > + > + /* > + * Register that the range is about to be accepted. > + * Make sure nobody else will accept it. > + */ > + list_add(&range.list, &accepting_list); > + > + range_start = range.start; > for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap, > - DIV_ROUND_UP(end, unit_size)) { > + range.end) { > unsigned long phys_start, phys_end; > unsigned long len = range_end - range_start; > > phys_start = range_start * unit_size + unaccepted->phys_base; > phys_end = range_end * unit_size + unaccepted->phys_base; > > + spin_unlock_irqrestore(&unaccepted_memory_lock, flags); Hm so this is bad, AFAICS. We enable IRQs, then an IRQ can come and try to accept in the same unit_size block, so it will keep the retrying by the goto above and itself have irqs disabled so the cond_resched() will never let us finish? > + > arch_accept_memory(phys_start, phys_end); > + > + spin_lock_irqsave(&unaccepted_memory_lock, flags); > bitmap_clear(unaccepted->bitmap, range_start, len); > } > + > + list_del(&range.list); > spin_unlock_irqrestore(&unaccepted_memory_lock, flags); > } >