On 10/16/23 18:31, Kirill A. Shutemov wrote: > Michael reported soft lockups on a system that has unaccepted memory. > This occurs when a user attempts to allocate and accept memory on > multiple CPUs simultaneously. > > The root cause of the issue is that memory acceptance is serialized with > a spinlock, allowing only one CPU to accept memory at a time. The other > CPUs spin and wait for their turn, leading to starvation and soft lockup > reports. > > To address this, the code has been modified to release the spinlock > while accepting memory. This allows for parallel memory acceptance on > multiple CPUs. > > A newly introduced "accepting_list" keeps track of which memory is > currently being accepted. This is necessary to prevent parallel > acceptance of the same memory block. If a collision occurs, the lock is > released and the process is retried. > > Such collisions should rarely occur. The main path for memory acceptance > is the page allocator, which accepts memory in MAX_ORDER chunks. As long > as MAX_ORDER is equal to or larger than the unit_size, collisions will > never occur because the caller fully owns the memory block being > accepted. > > Aside from the page allocator, only memblock and deferered_free_range() > accept memory, but this only happens during boot. > > The code has been tested with unit_size == 128MiB to trigger collisions > and validate the retry codepath. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > Reported-by: Michael Roth <michael.roth@xxxxxxx > Fixes: 2053bc57f367 ("efi: Add unaccepted memory support") > Cc: <stable@xxxxxxxxxx> > Reviewed-by: Nikolay Borisov <nik.borisov@xxxxxxxx> Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx> <snip> > + range_start = range.start; > for_each_set_bitrange_from(range_start, range_end, unaccepted->bitmap, > - DIV_ROUND_UP(end, unit_size)) { > + range.end) { > unsigned long phys_start, phys_end; > unsigned long len = range_end - range_start; > > phys_start = range_start * unit_size + unaccepted->phys_base; > phys_end = range_end * unit_size + unaccepted->phys_base; > > + /* > + * Keep interrupts disabled until the accept operation is > + * complete in order to prevent deadlocks. > + * > + * Enabling interrupts before calling arch_accept_memory() > + * creates an opportunity for an interrupt handler to request > + * acceptance for the same memory. The handler will continuously > + * spin with interrupts disabled, preventing other task from > + * making progress with the acceptance process. > + */ AFAIU on PREEMPT_RT the spin_lock_irqsave() doesn't disable interrupts, so this does not leave them disabled. But it also shouldn't be a risk of deadlock because the interrupt handlers are themselves preemptible. The latency might be bad as the cpu_relax() retry loop will not cause the task everyone might be waiting for to be prioritised, but I guess it's not a big issue as anyone with RT requirements probably won't use unaccepted memory in the first place, and as you mention hitting the retry loop after boot in a normal configuration should be pretty much never. > + spin_unlock(&unaccepted_memory_lock); > + > arch_accept_memory(phys_start, phys_end); > + > + spin_lock(&unaccepted_memory_lock); > bitmap_clear(unaccepted->bitmap, range_start, len); > } > + > + list_del(&range.list); > spin_unlock_irqrestore(&unaccepted_memory_lock, flags); > } >