[PATCH] percpu: fix chunk range calculation

dyoung@xxxxxxxxxx (Dave Young) · Tue, 22 Nov 2011 11:00:28 +0800

On 11/22/2011 01:01 AM, Tejun Heo wrote:

> Hello, Dave.
> 
> On Mon, Nov 21, 2011 at 09:45:51AM +0800, Dave Young wrote:
>> On 11/19/2011 02:55 AM, Tejun Heo wrote:
>>
>>> Percpu allocator recorded the cpus which map to the first and last
>>> units in pcpu_first/last_unit_cpu respectively and used them to
>>> determine the address range of a chunk - e.g. it assumed that the
>>> first unit has the lowest address in a chunk while the last unit has
>>> the highest address.
>>>
>>> This simply isn't true.  Groups in a chunk can have arbitrary positive
>>> or negative offsets from the previous one and there is no guarantee
>>> that the first unit occupies the lowest offset while the last one the
>>> highest.
>>>
>>> Fix it by actually comparing unit offsets to determine cpus occupying
>>> the lowest and highest offsets.  Also, rename pcu_first/last_unit_cpu
>>> to pcpu_low/high_unit_cpu to avoid confusion.
>>>
>>> The chunk address range is used to flush cache on vmalloc area
>>> map/unmap and decide whether a given address is in the first chunk by
>>> per_cpu_ptr_to_phys() and the bug was discovered by invalid
>>> per_cpu_ptr_to_phys() translation for crash_note.
>>>
>>> Kudos to Dave Young for tracking down the problem.
>>
>> Tejun, thanks
>>
>> Now that if addr is not in first trunk it must be in vmalloc area, below
>> logic should be right:
>> 	if (in_first_chunk) {
>> 		if (!is_vmalloc_addr(addr))
>> 			return __pa(addr);
>> 		else
>> 			return page_to_phys(vmalloc_to_page(addr));
>> 	} else
>> 		if (!is_vmalloc_addr(addr)) /* not possible */
>> 			return __pa(addr);
>> 		else
>> 			return page_to_phys(pcpu_addr_to_page(addr));
>>
>> So how about just simply remove in first chunk checking to simplify the
>> code as followging:
>>
>> phys_addr_t per_cpu_ptr_to_phys(void *addr)
>> {
>> 	if (!is_vmalloc_addr(addr))
>> 		return __pa(addr);
>> 	else
>> 		return page_to_phys(pcpu_addr_to_page(addr));
>> }
> 
> Yes, that's much simpler.  Hmmm... I'm still slightly reluctant
> because the current code reflects better how percpu allocator actually
> works.  It has special setup for the first chunk, which currently
> supports either embedding in linear address space or vmalloc mapping,
> and, from the second one, the backing allocator (currently either vm
> or km) provides translation.
> 
> per_cpu_ptr_to_phys() has been pretty good at exposing wrong
> assumptions in address translation mostly because this is one of few
> points where the assumptions are visible in a verifiable way (I think
> this is the second bug it has discovered).
> 
> This forces the verification that "if it isn't in one of the explicit
> first chunk areas, it MUST follow backing allocator mapping" which can
> discover both bugs in percpu allocator itself and
> per_cpu_ptr_to_phys() callers.  e.g. with the proposed change, if the
> caller passes in usual kernel address on percpu-vm which is not in the
> first chunk, it will return __pa for the address even though that
> address can't possibly be a percpu address, but the current code will
> trip VIRTUAL_BUG_ON() in vmalloc_to_page().
> 
> Maybe we can add comment there how it can be further simplified and
> why things look more complicated than necessary?

I wonder if it's good to add something like CONFIG_PER_CPU_DEBUG?
Anyway I have no strong opinion with this. I'm fine to either of adding
comment and put it into debug around.

Thank you for your fix.

-- 
Thanks
Dave