Re: help debug number of CPU detect failure

Santosh <ysan99@xxxxxxxxx> · Mon, 9 Mar 2020 14:52:29 -0700

On Fri, Mar 6, 2020 at 6:13 AM Dave Anderson <anderson@xxxxxxxxxx> wrote:
>
>
>
> ----- Original Message -----
> > On Thu, Mar 5, 2020 at 1:07 PM Santosh <ysan99@xxxxxxxxx> wrote:
> > >
> > > On Thu, Mar 5, 2020 at 12:54 PM Dave Anderson <anderson@xxxxxxxxxx> wrote:
> > > >
> > > > > > I suspect that it's a problem with either the --kaslr offset and/or
> > > > > > the phys_base value that you have used.
> > > > >
> > > > > Is there method to know or print kaslr & phy_base in a running Linux
> > > > > system?
> > > >
> > > > They are normally passed in the VMCOREINFO data that is contained in an
> > > > ELF PT_NOTE
> > > > in the dumpfile header.  For example, here's a dump of the normal
> > > > VMCOREINFO data,
> > > > where the phys_base and KASLR offsets are down near the bottom:
> > > >
> > > >                       OSRELEASE=4.18.0-185.el8.x86_64
> > > >                       PAGESIZE=4096
> > > >                       SYMBOL(init_uts_ns)=ffffffffbd812540
> > > >                       SYMBOL(node_online_map)=ffffffffbda0f520
> > > >                       SYMBOL(swapper_pg_dir)=ffffffffbd80a000
> > > >                       SYMBOL(_stext)=ffffffffbc600000
> > > >                       SYMBOL(vmap_area_list)=ffffffffbd8d78b0
> > > >                       SYMBOL(mem_section)=ffff956a3ffd2000
> > > >                       LENGTH(mem_section)=2048
> > > >                       SIZE(mem_section)=16
> > > >                       OFFSET(mem_section.section_mem_map)=0
> > > >                       SIZE(page)=64
> > > >                       SIZE(pglist_data)=171968
> > > >                       SIZE(zone)=1472
> > > >                       SIZE(free_area)=88
> > > >                       SIZE(list_head)=16
> > > >                       SIZE(nodemask_t)=128
> > > >                       OFFSET(page.flags)=0
> > > >                       OFFSET(page._refcount)=52
> > > >                       OFFSET(page.mapping)=24
> > > >                       OFFSET(page.lru)=8
> > > >                       OFFSET(page._mapcount)=48
> > > >                       OFFSET(page.private)=40
> > > >                       OFFSET(page.compound_dtor)=16
> > > >                       OFFSET(page.compound_order)=17
> > > >                       OFFSET(page.compound_head)=8
> > > >                       OFFSET(pglist_data.node_zones)=0
> > > >                       OFFSET(pglist_data.nr_zones)=171232
> > > >                       OFFSET(pglist_data.node_start_pfn)=171240
> > > >                       OFFSET(pglist_data.node_spanned_pages)=171256
> > > >                       OFFSET(pglist_data.node_id)=171264
> > > >                       OFFSET(zone.free_area)=192
> > > >                       OFFSET(zone.vm_stat)=1296
> > > >                       OFFSET(zone.spanned_pages)=112
> > > >                       OFFSET(free_area.free_list)=0
> > > >                       OFFSET(list_head.next)=0
> > > >                       OFFSET(list_head.prev)=8
> > > >                       OFFSET(vmap_area.va_start)=0
> > > >                       OFFSET(vmap_area.list)=48
> > > >                       LENGTH(zone.free_area)=11
> > > >                       SYMBOL(log_buf)=ffffffffbd85b140
> > > >                       SYMBOL(log_buf_len)=ffffffffbd85b13c
> > > >                       SYMBOL(log_first_idx)=ffffffffbe319778
> > > >                       SYMBOL(clear_idx)=ffffffffbe319744
> > > >                       SYMBOL(log_next_idx)=ffffffffbe319768
> > > >                       SIZE(printk_log)=16
> > > >                       OFFSET(printk_log.ts_nsec)=0
> > > >                       OFFSET(printk_log.len)=8
> > > >                       OFFSET(printk_log.text_len)=10
> > > >                       OFFSET(printk_log.dict_len)=12
> > > >                       LENGTH(free_area.free_list)=5
> > > >                       NUMBER(NR_FREE_PAGES)=0
> > > >                       NUMBER(PG_lru)=5
> > > >                       NUMBER(PG_private)=12
> > > >                       NUMBER(PG_swapcache)=9
> > > >                       NUMBER(PG_swapbacked)=18
> > > >                       NUMBER(PG_slab)=8
> > > >                       NUMBER(PG_hwpoison)=22
> > > >                       NUMBER(PG_head_mask)=32768
> > > >                       NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-129
> > > >                       NUMBER(HUGETLB_PAGE_DTOR)=2
> > > >                       NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE)=-257
> > > >    ===============>   NUMBER(phys_base)=16437477376
> > > >                       SYMBOL(init_top_pgt)=ffffffffbd80a000
> > > >                       NUMBER(pgtable_l5_enabled)=0
> > > >                       SYMBOL(node_data)=ffffffffbda0ad20
> > > >                       LENGTH(node_data)=1024
> > > >    ===============>   KERNELOFFSET=3b600000
> > > >                       NUMBER(KERNEL_IMAGE_SIZE)=1073741824
> > > >                       NUMBER(sme_mask)=0
> > > >                       CRASHTIME=1583350919
> > > >
> > > > But in your Azure-generated dumpfile, I note that each cpu's NT_PRSTATUS
> > > > note
> > > > contains junk data, and while does have a VMCOREINFO note, it contains
> > > > this:
> > > >
> > > > Elf64_Nhdr:
> > > >                n_namesz: 11 ("VMCOREINFO")
> > > >                n_descsz: 42
> > > >                  n_type: 0 (unused)
> > > >                          FAKE1=IGNORE1
> > > >                          FAKE2=IGNORE2
> > > >                          FAKE3=IGNORE3
> > > >
> > > > So that's why you need to pass in the two arguments.
> > > >
> > > > Now, the crash utility should be able to be brought up successfully
> > > > on a live system without passing the arguments.  And once you've done
> > > > that, you could get the values like this:
> > > >
> > > >   crash> help -m | grep phys_base
> > > >                   phys_base: 3d3c00000
> > > >   crash> help -k | grep relocate
> > > >         relocate: ffffffffc4a00000  (KASLR offset: 3b600000 / 950MB)
> > > >   crash>
> > > >
> > > > But since they change with each reboot, you would have to capture them
> > > > while running on the live system, and save them somewhere for a
> > > > subsequent
> > > > crash.  So that goes back to my question -- how did you get the numbers
> > > > that you used?
> > >
> > > The number I had got by simply grepping through coredump strings.
> > > $ strings vm1_numa_4gb_5cpu.coredump | grep -v strings | grep
> > > 'KERNELOFFSET=\|NUMBER(phys_base)='
> > >
> > > Machine is still running and I cross verified those numbers with crash
> > > and those were correct.
> > >
> > > crash> p vmcoreinfo_data+1600
> > > $1 = (unsigned char *) 0xffff917d3cde1640
> > > "poison)=22\nNUMBER(PG_head_mask)=32768\nNUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-128\nNUMBER(HUGETLB_PAGE_DTOR)=2\nNUMBER(phys_base)=4355784704\nSYMBOL(init_top_pgt)=ffffffff82a0a000\nSYMBOL(node_data)=ffffffff82c5d780\nLENGTH(node_data)=1024\nKERNELOFFSET=600000\nNUMBER"...
> > >
> > > Now it appears to me that something wrong in Azure generated dump file.
> >
> > Something to do with numa:
> >
> > santosh@u1804lts:~$ cat /proc/sys/kernel/numa_balancing
> > 1
> >
> > HyperV VM with 1 numa node (numa_balancing = 0) -- Linux with nokaslr
> > -- vm2core -- ELF coredump -- crash tool -- Ok
> > HyperV VM with 1 numa node (numa_balancing = 0) -- LInux with kaslr --
> > vm2core -- ELF coredump -- crash tool -- Ok
> > HyperV VM with 2 numa nodes (numa_balancing = 1) -- Linux with nokaslr
> > -- vm2core -- ELF coredump -- crash tool -- Ok
> > HyperV VM with 2 numa nodes (numa_balancing = 1) -- LInux with kaslr
> > -- vm2core -- ELF coredump -- crash tool -- Not ok
> >
> > Do we have to specify the numa topology somehow to crash tool or it
> > should  already be handled in coredump file?
>
> Definitely not.  The crash utility is only interested in:
>
>   1. kernel virtual address values -- which KASLR modifies from the values
>      compiled into the vmlinux file,
>   2. translating those kernel virtual addresses into physical addresses, and
>   3. accessing those physical addresses from the memory source.
>
> As I understand it, numa_balancing is concerned with user-space virtual
> address mapping, where the kernel may re-map an underlying physical
> address from one NUMA node to another.  User-space memory is never
> accessed by the crash utility unless requested by a run-time command
> that specifically specifies it.
>
> Dave

Hi Dave,

I did some more experiments and found that it is nothing to do with numa.

I also found that the issue gets resolved when I insert
"SYMBOL(_stext)=" into vmcoreinfo.
Meaning sometime crash needs _stext value along with kaslr & phys_base.

Thanks,
Santosh
>
> --
> Crash-utility mailing list
> Crash-utility@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/crash-utility
>

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility