Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON() falsely

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2016/10/14 7:29, Tejun Heo wrote:
> On Tue, Oct 11, 2016 at 10:00:28PM +0800, zijun_hu wrote:
>> From: zijun_hu <zijun_hu@xxxxxxx>
>>
>> as shown by pcpu_build_alloc_info(), the number of units within a percpu
>> group is educed by rounding up the number of CPUs within the group to
>> @upa boundary, therefore, the number of CPUs isn't equal to the units's
>> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
>> uses BUG_ON() to assert one number is equal the other roughly, so a panic
>> is maybe triggered by the BUG_ON() falsely.
>>
>> in order to fix this issue, the number of CPUs is rounded up then compared
>> with units's, the BUG_ON() is replaced by warning and returning error code
>> as well to keep system alive as much as possible.
> 
> I really can't decode what the actual issue is here.  Can you please
> give an example of a concrete case?
> 
the right relationship between the number of CPUs @nr_cpus within a percpu group
and the number of unites @nr_units within the same group is that
@nr_units == roundup(@nr_cpus, @upa);

the process of consideration is shown as follows:

1) current code segments:

BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != num_possible_cpus());

2) changes for considering the right relationship between the number of CPUs and units 

BUG_ON(ai->nr_groups != 1);
BUG_ON(ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa));

3) replace BUG_ON() by warning and returning error code since it seems BUG_ON() isn't
   nice as shown by linus recent LKML mail

BUG_ON(ai->nr_groups != 1);
if (ai->groups[0].nr_units != roundup(num_possible_cpus(), @upa))
   return -EINVAL;

so 3) is my finial changes;
for the relationship of both numbers : see the reply for andrew

>> @@ -2113,21 +2120,22 @@ int __init pcpu_page_first_chunk(size_t reserved_size,
>>  
>>  	/* allocate pages */
>>  	j = 0;
>> -	for (unit = 0; unit < num_possible_cpus(); unit++)
>> +	for (unit = 0; unit < num_possible_cpus(); unit++) {
>> +		unsigned int cpu = ai->groups[0].cpu_map[unit];
>>  		for (i = 0; i < unit_pages; i++) {
>> -			unsigned int cpu = ai->groups[0].cpu_map[unit];
>>  			void *ptr;
>>  
>>  			ptr = alloc_fn(cpu, PAGE_SIZE, PAGE_SIZE);
>>  			if (!ptr) {
>>  				pr_warn("failed to allocate %s page for cpu%u\n",
>> -					psize_str, cpu);
>> +						psize_str, cpu);
> 
> And stop making gratuitous changes?
>

this changes is just for looking nicer instinctively
@cpu can be determined in the first outer loop.

> Thanks.
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]