On 11/18/2011 04:43 PM, Dave Young wrote: > On 11/18/2011 12:40 AM, Tim Hartrick wrote: > >> >> Dave, Tejun, Americo, >> >> Attached find three configs: >> >> Ubuntu 2.6.32-21-server - works >> Ubuntu 2.6.38-8-server - fails >> Ubuntu 3.3.1-030101-generic (stable) - fails > > > Thanks, Tim > >> >> On Thu, 2011-11-17 at 15:21 +0800, Dave Young wrote: >>> On 11/17/2011 01:22 PM, Tim Hartrick wrote: >>> >>>> Tejun, Dave, >>>> >>>> I will be happy to answer any questions about our environment or test >>>> debug or other patches. Just tell me what you need. >>> >>> >>> Thank you. Can you share your kernel config? >>> >>>> >>>> tim >>>> >>>> On Nov 16, 2011 8:44 PM, "Dave Young" <dyoung at redhat.com >>>> <mailto:dyoung at redhat.com>> wrote: >>>> >>>> On 11/17/2011 12:34 PM, Tejun Heo wrote: >>>> >>>> > Hello, >>>> > >>>> > On Wed, Nov 16, 2011 at 7:30 PM, Dave Young <dyoung at redhat.com >>>> <mailto:dyoung at redhat.com>> wrote: >>>> >> This addr is converted to an invalid phys address, >>>> > >>>> > I'm a bit lost on the context here. Who's calling >>>> per_cpu_ptr_to_phys()? >>>> >>>> >>>> It's drivers/base/cpu.c : show_crash_notes() >>>> >>>> > >>>> >> looking the code below: >>>> >> if (in_first_chunk) { >>>> >> if (!is_vmalloc_addr(addr)) >>>> >> return __pa(addr); >>>> >> else >>>> >> return page_to_phys(vmalloc_to_page(addr)); >>>> >> } else >>>> >> return page_to_phys(pcpu_addr_to_page(addr)); >>>> >> >>>> >> I dont understand per cpu allocation well, if addr is not in >>>> first chunk >>>> >> then it should be in vmalloc area? >>>> > >>>> > Yes, it is. First chunk can be embedded in the kernel linear address >>>> > space but from the second one, it's always set up from the top of the >>>> > vmalloc area with the same offset layout as the first chunk. >>>> >>>> >>>> in this case ffff880667c19ad0 fall out of vmalloc area and it's not in >>>> first chunk also. > > > Tejun, > > With config provided by Tim, I can reproduce this problem on a dell > machine. I did some debug about this, found that fisrt_start < > first_end, typo, I mean first_start > first_end so there's no chance to check in for_each_possible_cpu(cpu) > > why is the first_start/first_end wrong? pcpu_unit_offsets[] is not > ordered? any idea? > > I see below hack make the bug gone, it confirmed the addr is indeed in > first chunk. > > diff --git a/mm/percpu.c b/mm/percpu.c > index bf80e55..8f6eb58 100644 > --- a/mm/percpu.c > +++ b/mm/percpu.c > @@ -984,26 +984,14 @@ phys_addr_t per_cpu_ptr_to_phys(void *addr) > { > void __percpu *base = __addr_to_pcpu_ptr(pcpu_base_addr); > bool in_first_chunk = false; > - unsigned long first_start, first_end; > unsigned int cpu; > > - /* > - * The following test on first_start/end isn't strictly > - * necessary but will speed up lookups of addresses which > - * aren't in the first chunk. > - */ > - first_start = pcpu_chunk_addr(pcpu_first_chunk, pcpu_first_unit_cpu, 0); > - first_end = pcpu_chunk_addr(pcpu_first_chunk, pcpu_last_unit_cpu, > - pcpu_unit_pages); > - if ((unsigned long)addr >= first_start && > - (unsigned long)addr < first_end) { > - for_each_possible_cpu(cpu) { > - void *start = per_cpu_ptr(base, cpu); > - > - if (addr >= start && addr < start + pcpu_unit_size) { > - in_first_chunk = true; > - break; > - } > + for_each_possible_cpu(cpu) { > + void *start = per_cpu_ptr(base, cpu); > + > + if (addr >= start && addr < start + pcpu_unit_size) { > + in_first_chunk = true; > + break; > } > } > >>>> >>>> > >>>> >> Tejun, do you have any idea about this? >>>> > >>>> > Can you please tell me how to reproduce the problem? I'll try to find >>>> > out what's going on. >>>> >>>> >>>> make sure kernel support CRASH DUMP, then cat >>>> /sys/devices/system/cpu/cpu[x]/crash_notes >>>> >>>> Tim Hartrick <tim at edgecast.com <mailto:tim at edgecast.com>> reported >>>> the problem when test kdump. >>>> But I can not reproduce this. I think tim can help to test >>>> >>>> > >>>> > Thanks. >>>> > >>>> >>>> >>>> >>>> -- >>>> Thanks >>>> Dave >>>> >>> >>> >>> >> > > > -- Thanks Dave