Re: [PATCH 5.15 108/476] mm/sparsemem: fix race in accessing memory_section->usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/25/24 16:55, Donald Buczek wrote:
> On 2/21/24 14:02, Greg Kroah-Hartman wrote:
>> 5.15-stable review patch.  If anyone has any objections, please let me know.
>>
>> ------------------
>>
>> From: Charan Teja Kalla <quic_charante@xxxxxxxxxxx>
>>
>> [ Upstream commit 5ec8e8ea8b7783fab150cf86404fc38cb4db8800 ]
>>
>> The below race is observed on a PFN which falls into the device memory
>> region with the system memory configuration where PFN's are such that
>> [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL].  Since normal zone start and end
>> pfn contains the device memory PFN's as well, the compaction triggered
>> will try on the device memory PFN's too though they end up in NOP(because
>> pfn_to_online_page() returns NULL for ZONE_DEVICE memory sections).  When
>> from other core, the section mappings are being removed for the
>> ZONE_DEVICE region, that the PFN in question belongs to, on which
>> compaction is currently being operated is resulting into the kernel crash
>> with CONFIG_SPASEMEM_VMEMAP enabled.  The crash logs can be seen at [1].
>>
>> compact_zone()			memunmap_pages
>> -------------			---------------
>> __pageblock_pfn_to_page
>>    ......
>>  (a)pfn_valid():
>>      valid_section()//return true
>> 			      (b)__remove_pages()->
>> 				  sparse_remove_section()->
>> 				    section_deactivate():
>> 				    [Free the array ms->usage and set
>> 				     ms->usage = NULL]
>>      pfn_section_valid()
>>      [Access ms->usage which
>>      is NULL]
>>
>> NOTE: From the above it can be said that the race is reduced to between
>> the pfn_valid()/pfn_section_valid() and the section deactivate with
>> SPASEMEM_VMEMAP enabled.
>>
>> The commit b943f045a9af("mm/sparse: fix kernel crash with
>> pfn_section_valid check") tried to address the same problem by clearing
>> the SECTION_HAS_MEM_MAP with the expectation of valid_section() returns
>> false thus ms->usage is not accessed.
>>
>> Fix this issue by the below steps:
>>
>> a) Clear SECTION_HAS_MEM_MAP before freeing the ->usage.
>>
>> b) RCU protected read side critical section will either return NULL
>>    when SECTION_HAS_MEM_MAP is cleared or can successfully access ->usage.
>>
>> c) Free the ->usage with kfree_rcu() and set ms->usage = NULL.  No
>>    attempt will be made to access ->usage after this as the
>>    SECTION_HAS_MEM_MAP is cleared thus valid_section() return false.
>>
>> Thanks to David/Pavan for their inputs on this patch.
>>
>> [1] https://lore.kernel.org/linux-mm/994410bb-89aa-d987-1f50-f514903c55aa@xxxxxxxxxxx/
>>
>> On Snapdragon SoC, with the mentioned memory configuration of PFN's as
>> [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL], we are able to see bunch of
>> issues daily while testing on a device farm.
>>
>> For this particular issue below is the log.  Though the below log is
>> not directly pointing to the pfn_section_valid(){ ms->usage;}, when we
>> loaded this dump on T32 lauterbach tool, it is pointing.
>>
>> [  540.578056] Unable to handle kernel NULL pointer dereference at
>> virtual address 0000000000000000
>> [  540.578068] Mem abort info:
>> [  540.578070]   ESR = 0x0000000096000005
>> [  540.578073]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [  540.578077]   SET = 0, FnV = 0
>> [  540.578080]   EA = 0, S1PTW = 0
>> [  540.578082]   FSC = 0x05: level 1 translation fault
>> [  540.578085] Data abort info:
>> [  540.578086]   ISV = 0, ISS = 0x00000005
>> [  540.578088]   CM = 0, WnR = 0
>> [  540.579431] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBSBTYPE=--)
>> [  540.579436] pc : __pageblock_pfn_to_page+0x6c/0x14c
>> [  540.579454] lr : compact_zone+0x994/0x1058
>> [  540.579460] sp : ffffffc03579b510
>> [  540.579463] x29: ffffffc03579b510 x28: 0000000000235800 x27:000000000000000c
>> [  540.579470] x26: 0000000000235c00 x25: 0000000000000068 x24:ffffffc03579b640
>> [  540.579477] x23: 0000000000000001 x22: ffffffc03579b660 x21:0000000000000000
>> [  540.579483] x20: 0000000000235bff x19: ffffffdebf7e3940 x18:ffffffdebf66d140
>> [  540.579489] x17: 00000000739ba063 x16: 00000000739ba063 x15:00000000009f4bff
>> [  540.579495] x14: 0000008000000000 x13: 0000000000000000 x12:0000000000000001
>> [  540.579501] x11: 0000000000000000 x10: 0000000000000000 x9 :ffffff897d2cd440
>> [  540.579507] x8 : 0000000000000000 x7 : 0000000000000000 x6 :ffffffc03579b5b4
>> [  540.579512] x5 : 0000000000027f25 x4 : ffffffc03579b5b8 x3 :0000000000000001
>> [  540.579518] x2 : ffffffdebf7e3940 x1 : 0000000000235c00 x0 :0000000000235800
>> [  540.579524] Call trace:
>> [  540.579527]  __pageblock_pfn_to_page+0x6c/0x14c
>> [  540.579533]  compact_zone+0x994/0x1058
>> [  540.579536]  try_to_compact_pages+0x128/0x378
>> [  540.579540]  __alloc_pages_direct_compact+0x80/0x2b0
>> [  540.579544]  __alloc_pages_slowpath+0x5c0/0xe10
>> [  540.579547]  __alloc_pages+0x250/0x2d0
>> [  540.579550]  __iommu_dma_alloc_noncontiguous+0x13c/0x3fc
>> [  540.579561]  iommu_dma_alloc+0xa0/0x320
>> [  540.579565]  dma_alloc_attrs+0xd4/0x108
>>
>> [quic_charante@xxxxxxxxxxx: use kfree_rcu() in place of synchronize_rcu(), per David]
>>   Link: https://lkml.kernel.org/r/1698403778-20938-1-git-send-email-quic_charante@xxxxxxxxxxx
>> Link: https://lkml.kernel.org/r/1697202267-23600-1-git-send-email-quic_charante@xxxxxxxxxxx
>> Fixes: f46edbd1b151 ("mm/sparsemem: add helpers track active portions of a section at boot")
>> Signed-off-by: Charan Teja Kalla <quic_charante@xxxxxxxxxxx>
>> Cc: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx>
>> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
>> Cc: David Hildenbrand <david@xxxxxxxxxx>
>> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
>> Cc: Oscar Salvador <osalvador@xxxxxxx>
>> Cc: Vlastimil Babka <vbabka@xxxxxxx>
>> Cc: <stable@xxxxxxxxxxxxxxx>
>> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
>> ---
>>  include/linux/mmzone.h | 14 +++++++++++---
>>  mm/sparse.c            | 17 +++++++++--------
>>  2 files changed, 20 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 9e1485083398..8b8349ffa1cd 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1287,6 +1287,7 @@ static inline unsigned long section_nr_to_pfn(unsigned long sec)
>>  #define SUBSECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUBSECTION_MASK)
>>  
>>  struct mem_section_usage {
>> +	struct rcu_head rcu;
>>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>>  	DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION);
>>  #endif
>> @@ -1457,7 +1458,7 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
>>  {
>>  	int idx = subsection_map_index(pfn);
>>  
>> -	return test_bit(idx, ms->usage->subsection_map);
>> +	return test_bit(idx, READ_ONCE(ms->usage)->subsection_map);
>>  }
>>  #else
>>  static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
>> @@ -1481,6 +1482,7 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
>>  static inline int pfn_valid(unsigned long pfn)
>>  {
>>  	struct mem_section *ms;
>> +	int ret;
>>  
>>  	/*
>>  	 * Ensure the upper PAGE_SHIFT bits are clear in the
>> @@ -1494,13 +1496,19 @@ static inline int pfn_valid(unsigned long pfn)
>>  	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>>  		return 0;
>>  	ms = __pfn_to_section(pfn);
>> -	if (!valid_section(ms))
>> +	rcu_read_lock();
>> +	if (!valid_section(ms)) {
>> +		rcu_read_unlock();
>>  		return 0;
>> +	}
>>  	/*
>>  	 * Traditionally early sections always returned pfn_valid() for
>>  	 * the entire section-sized span.
>>  	 */
>> -	return early_section(ms) || pfn_section_valid(ms, pfn);
>> +	ret = early_section(ms) || pfn_section_valid(ms, pfn);
>> +	rcu_read_unlock();
>> +
>> +	return ret;
>>  }
>>  #endif
>>  
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index 120bc8ea5293..27092badd15b 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -789,6 +789,13 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>>  	if (empty) {
>>  		unsigned long section_nr = pfn_to_section_nr(pfn);
>>  
>> +		/*
>> +		 * Mark the section invalid so that valid_section()
>> +		 * return false. This prevents code from dereferencing
>> +		 * ms->usage array.
>> +		 */
>> +		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
>> +
>>  		/*
>>  		 * When removing an early section, the usage map is kept (as the
>>  		 * usage maps of other sections fall into the same page). It
>> @@ -797,16 +804,10 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
>>  		 * was allocated during boot.
>>  		 */
>>  		if (!PageReserved(virt_to_page(ms->usage))) {
>> -			kfree(ms->usage);
>> -			ms->usage = NULL;
>> +			kfree_rcu(ms->usage, rcu);
>> +			WRITE_ONCE(ms->usage, NULL);
>>  		}
>>  		memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
>> -		/*
>> -		 * Mark the section invalid so that valid_section()
>> -		 * return false. This prevents code from dereferencing
>> -		 * ms->usage array.
>> -		 */
>> -		ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
>>  	}
>>  
>>  	/*
> 
> Maybe this is already known, but just FYI I wanted to drop the note that for some
> reasons I don't understand, this patch prevents me from compiling the proprietary
> Nvidia Unix driver with versions 510.108.03 and 535.104.05 in the 5.15 series.
> No problems with the 6.5 series, although the patch is included there as well.
> 
> linux-5.15.155 , nvidia 535.104.05 :
> 
>     make -j $(nproc) CC=gcc SYSSRC=/scratch/local/linux clean && make -j $(nproc) CC=gcc SYSSRC=/scratch/local/linux V=2 modules
>     [...]
>     MODPOST /scratch/local/bee-buczek/nvidia/test_535-104-05/kernel/Module.symvers - due to target missing
>     ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'rcu_read_unlock_strict'
>     make[2]: *** [scripts/Makefile.modpost:133: /scratch/local/bee-buczek/nvidia/test_535-104-05/kernel/Module.symvers] Error 1
>     make[2]: *** Deleting file '/scratch/local/bee-buczek/nvidia/test_535-104-05/kernel/Module.symvers'
>     make[1]: *** [Makefile:1826: modules] Error 2
>     make[1]: Leaving directory '/scratch/local/linux'
>     make: *** [Makefile:82: modules] Error 2
> 
> 
> With fd117d03cefd15 ("mm/sparsemem: fix race in accessing memory_section->usage") reverted,
> the build completes.

Sorry, that was my revert patch. Its b448de2459b6d6 ("mm/sparsemem: fix race in accessing memory_section->usage").

D.

> 
> Best
>   Donald

-- 
Donald Buczek
buczek@xxxxxxxxxxxxx
Tel: +49 30 8413 1433





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux