On Fri, Mar 08, 2024 at 03:01:28PM +0100, Marek Szyprowski wrote: > On 07.03.2024 02:45, Christoph Lameter (Ampere) wrote: > > Currently defconfig selects NR_CPUS=256, but some vendors (e.g. Ampere > > Computing) are planning to ship systems with 512 CPUs. So that all CPUs on > > these systems can be used with defconfig, we'd like to bump NR_CPUS to 512. > > Therefore this patch increases the default NR_CPUS from 256 to 512. > > > > As increasing NR_CPUS will increase the size of cpumasks, there's a fear that > > this might have a significant impact on stack usage due to code which places > > cpumasks on the stack. To mitigate that concern, we can select > > CPUMASK_OFFSTACK. As that doesn't seem to be a problem today with > > NR_CPUS=256, we only select this when NR_CPUS > 256. > > > > CPUMASK_OFFSTACK configures the cpumasks in the kernel to be > > dynamically allocated. This was used in the X86 architecture in the > > past to enable support for larger CPU configurations up to 8k cpus. [...] > This patch landed in today's linux-next as commit 0499a78369ad ("ARM64: > Dynamically allocate cpumasks and increase supported CPUs to 512"). > Unfortunately it triggers the following warning during boot on most of > my ARM64-based test boards. Here is an example from Odroid-N2 board: I spent a big part of this afternoon going through the code paths but there's nothing obvious that triggered this problem. My suspicion is some memory corruption, algorithmically I can't see anything that could go wrong with CPUMASK_OFFSTACK. Unfortunately I could not reproduce it yet to be able to add some debug info. So I decided to revert this patch. If we get to the bottom of it during the merging window, I can still revive it. Otherwise we'll add it to linux-next post -rc1. Thanks for reporting it and subsequent debugging. -- Catalin