On 11/18/2011 12:40 AM, Tim Hartrick wrote: > > Dave, Tejun, Americo, > > Attached find three configs: > > Ubuntu 2.6.32-21-server - works > Ubuntu 2.6.38-8-server - fails > Ubuntu 3.3.1-030101-generic (stable) - fails Thanks, Tim > > On Thu, 2011-11-17 at 15:21 +0800, Dave Young wrote: >> On 11/17/2011 01:22 PM, Tim Hartrick wrote: >> >>> Tejun, Dave, >>> >>> I will be happy to answer any questions about our environment or test >>> debug or other patches. Just tell me what you need. >> >> >> Thank you. Can you share your kernel config? >> >>> >>> tim >>> >>> On Nov 16, 2011 8:44 PM, "Dave Young" <dyoung at redhat.com >>> <mailto:dyoung at redhat.com>> wrote: >>> >>> On 11/17/2011 12:34 PM, Tejun Heo wrote: >>> >>> > Hello, >>> > >>> > On Wed, Nov 16, 2011 at 7:30 PM, Dave Young <dyoung at redhat.com >>> <mailto:dyoung at redhat.com>> wrote: >>> >> This addr is converted to an invalid phys address, >>> > >>> > I'm a bit lost on the context here. Who's calling >>> per_cpu_ptr_to_phys()? >>> >>> >>> It's drivers/base/cpu.c : show_crash_notes() >>> >>> > >>> >> looking the code below: >>> >> if (in_first_chunk) { >>> >> if (!is_vmalloc_addr(addr)) >>> >> return __pa(addr); >>> >> else >>> >> return page_to_phys(vmalloc_to_page(addr)); >>> >> } else >>> >> return page_to_phys(pcpu_addr_to_page(addr)); >>> >> >>> >> I dont understand per cpu allocation well, if addr is not in >>> first chunk >>> >> then it should be in vmalloc area? >>> > >>> > Yes, it is. First chunk can be embedded in the kernel linear address >>> > space but from the second one, it's always set up from the top of the >>> > vmalloc area with the same offset layout as the first chunk. >>> >>> >>> in this case ffff880667c19ad0 fall out of vmalloc area and it's not in >>> first chunk also. Tejun, With config provided by Tim, I can reproduce this problem on a dell machine. I did some debug about this, found that fisrt_start < first_end, so there's no chance to check in for_each_possible_cpu(cpu) why is the first_start/first_end wrong? pcpu_unit_offsets[] is not ordered? any idea? I see below hack make the bug gone, it confirmed the addr is indeed in first chunk. diff --git a/mm/percpu.c b/mm/percpu.c index bf80e55..8f6eb58 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -984,26 +984,14 @@ phys_addr_t per_cpu_ptr_to_phys(void *addr) { void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr); bool in_first_chunk = false; - unsigned long first_start, first_end; unsigned int cpu; - /* - * The following test on first_start/end isn't strictly - * necessary but will speed up lookups of addresses which - * aren't in the first chunk. - */ - first_start = pcpu_chunk_addr(pcpu_first_chunk, pcpu_first_unit_cpu, 0); - first_end = pcpu_chunk_addr(pcpu_first_chunk, pcpu_last_unit_cpu, - pcpu_unit_pages); - if ((unsigned long)addr >= first_start && - (unsigned long)addr < first_end) { - for_each_possible_cpu(cpu) { - void *start = per_cpu_ptr(base, cpu); - - if (addr >= start && addr < start + pcpu_unit_size) { - in_first_chunk = true; - break; - } + for_each_possible_cpu(cpu) { + void *start = per_cpu_ptr(base, cpu); + + if (addr >= start && addr < start + pcpu_unit_size) { + in_first_chunk = true; + break; } } >>> >>> > >>> >> Tejun, do you have any idea about this? >>> > >>> > Can you please tell me how to reproduce the problem? I'll try to find >>> > out what's going on. >>> >>> >>> make sure kernel support CRASH DUMP, then cat >>> /sys/devices/system/cpu/cpu[x]/crash_notes >>> >>> Tim Hartrick <tim at edgecast.com <mailto:tim at edgecast.com>> reported >>> the problem when test kdump. >>> But I can not reproduce this. I think tim can help to test >>> >>> > >>> > Thanks. >>> > >>> >>> >>> >>> -- >>> Thanks >>> Dave >>> >> >> >> > -- Thanks Dave