On Wed, Aug 30, 2023 at 12:45:22PM -0400, Radu Rendec wrote: > On Wed, 2023-08-30 at 16:47 +0100, Sudeep Holla wrote: > > On Wed, Aug 30, 2023 at 08:13:09AM -0400, Radu Rendec wrote: > > > On Wed, 2023-08-30 at 12:49 +0100, Sudeep Holla wrote: > > > > On Fri, Aug 04, 2023 at 06:24:19PM -0700, Ricardo Neri wrote: > > > > > Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") > > > > > adds functionality that architectures can use to optionally allocate and > > > > > build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add > > > > > arch specific early level initializer") lets secondary CPUs correct (and > > > > > reallocate memory) cacheinfo data if needed. > > > > > > > > > > If the early build functionality is not used and cacheinfo does not need > > > > > correction, memory for cacheinfo is never allocated. x86 does not use the > > > > > early build functionality. Consequently, during the cacheinfo CPU hotplug > > > > > callback, last_level_cache_is_valid() attempts to dereference a NULL > > > > > pointer: > > > > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000100 > > > > > #PF: supervisor read access in kernel mode > > > > > #PF: error_code(0x0000) - not present page > > > > > PGD 0 P4D 0 > > > > > Oops: 0000 [#1] PREEPMT SMP NOPTI > > > > > CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 > > > > > RIP: 0010: last_level_cache_is_valid+0x95/0xe0a > > > > > > > > > > Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if > > > > > not done earlier. > > > > > > > > > > Cc: Andreas Herrmann <aherrmann@xxxxxxxx> > > > > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > > > > > Cc: Chen Yu <yu.c.chen@xxxxxxxxx> > > > > > Cc: Len Brown <len.brown@xxxxxxxxx> > > > > > Cc: Radu Rendec <rrendec@xxxxxxxxxx> > > > > > Cc: Pierre Gondois <Pierre.Gondois@xxxxxxx> > > > > > Cc: Pu Wen <puwen@xxxxxxxx> > > > > > Cc: "Rafael J. Wysocki" <rafael.j.wysocki@xxxxxxxxx> > > > > > Cc: Sudeep Holla <sudeep.holla@xxxxxxx> > > > > > Cc: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx> > > > > > Cc: Will Deacon <will@xxxxxxxxxx> > > > > > Cc: Zhang Rui <rui.zhang@xxxxxxxxx> > > > > > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > > Acked-by: Len Brown <len.brown@xxxxxxxxx> > > > > > Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") > > > > > > > > Not sure if we strictly need this(details below), but I am fine either way. > > > > > > > > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx> > > > > > --- > > > > > The motivation for commit 5944ce092b97 was to prevent a BUG splat in > > > > > PREEMPT_RT kernels during memory allocation. This splat is not observed on > > > > > x86 because the memory allocation for cacheinfo happens in > > > > > detect_cache_attributes() from the cacheinfo CPU hotplug callback. > > > > > > > > > > The dereference of a NULL pointer is not observed today because > > > > > cache_leaves(cpu) is zero until after init_cache_level() is called (also > > > > > during the CPU hotplug callback). Patch2 will set it earlier and the NULL- > > > > > pointer dereference will be observed. > > > > > > > > Right, this is the information I have been asking in the previous versions. > > > > This clarifies a lot. The trigger is in the patch 2/3 which is why it didn't > > > > make complete sense to me without it when you posted this patch independently. > > > > Thanks for posting it together and sorry for the delay(both reviewing this > > > > and in understanding the issue). > > > > > > > > Given the trigger for NULL pointer dereference is in 2/3, I am not sure > > > > if it is really worth applying this to all the stable kernels with the > > > > commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU"). > > > > That is the reason why I asked to drop fixes tag if you agree with me. > > > > It is simple fix, so I am OK if you prefer to see that in the stable kernels > > > > as well. > > > > > > Thanks for reviewing, Sudeep. Since my previous commit 6539cffa9495 > > > ("cacheinfo: Add arch specific early level initializer") opens a door > > > for the NULL pointer dereference, I would sleep better at night if the > > > fix was included in the stable kernels :) But seriously, I am concerned > > > that with the fix applied in mainline and not in stable, something else > > > could be backported to the stable in the future, that could trigger the > > > NULL pointer dereference there. Ricardo's patch 2/3 is one way to > > > trigger it, but you never know what other patch lands in mainline in > > > the future that assumes it's safe to set the cache leaves earlier. > > > > > > > Fair enough. I agree with you, so please retain the fixes tag as is. > > Please work with x86 maintainers to get it merged along with other patches. > > Let me know if you have other plans. > > Thanks, Sudeep. Technically, these are Ricardo's patches, so I will let > him engage with the x86 maintainers and drive the integration work. But > the plan looks good to me, and I will stand by and offer any support > may be needed for the fix patch. Thank you very much Sudeep and Radu for your feedback and review! The x86 maintainers are in the To: field of this patchset. The patches apply cleanly on top of the latest tip/master, but not on the latest rework of the topology evaluation from Thomas. Then I am not sure when/if this patchset will be merged. Thanks and BR, Ricardo