On 15.10.23 00:25, Andrew Morton wrote:
On Fri, 13 Oct 2023 18:34:27 +0530 Charan Teja Kalla <quic_charante@xxxxxxxxxxx> wrote:
The below race is observed on a PFN which falls into the device memory
region with the system memory configuration where PFN's are such that
[ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL]. Since normal zone start and
end pfn contains the device memory PFN's as well, the compaction
triggered will try on the device memory PFN's too though they end up in
NOP(because pfn_to_online_page() returns NULL for ZONE_DEVICE memory
sections). When from other core, the section mappings are being removed
for the ZONE_DEVICE region, that the PFN in question belongs to,
on which compaction is currently being operated is resulting into the
kernel crash with CONFIG_SPASEMEM_VMEMAP enabled.
Seems this bug is four years old, yes? It must be quite hard to hit.
From the description, it's not quite clear to me if this was actually
hit -- usually people include the dmesg bug/crash info.
When people review this, please offer opinions on whether a fix should
be backported into -stable kernels, thanks.
compact_zone() memunmap_page
------------- ---------------
__pageblock_pfn_to_page
......
(a)pfn_valid():
valid_section()//return true
(b)__remove_pages()->
sparse_remove_section()->
section_deactivate():
[Free the array ms->usage and set
ms->usage = NULL]
pfn_section_valid()
[Access ms->usage which
is NULL]
NOTE: From the above it can be said that the race is reduced to between
the pfn_valid()/pfn_section_valid() and the section deactivate with
SPASEMEM_VMEMAP enabled.
The commit b943f045a9af("mm/sparse: fix kernel crash with
pfn_section_valid check") tried to address the same problem by clearing
the SECTION_HAS_MEM_MAP with the expectation of valid_section() returns
false thus ms->usage is not accessed.
Fix this issue by the below steps:
a) Clear SECTION_HAS_MEM_MAP before freeing the ->usage.
b) RCU protected read side critical section will either return NULL when
SECTION_HAS_MEM_MAP is cleared or can successfully access ->usage.
c) Synchronize the rcu on the write side and free the ->usage. No
attempt will be made to access ->usage after this as the
SECTION_HAS_MEM_MAP is cleared thus valid_section() return false.
This affects any kind of memory hotunplug. When hotunplugging memory we
will end up calling synchronize_rcu() for each and every memory section,
which sounds extremely wasteful.
Can't we find a way to kfree_rcu() that thing and read/write the pointer
using READ?ONCE?WRITE_ONCE instead?
--
Cheers,
David / dhildenb