Re: Can't read stack contents from qemu dump

Dave Anderson <anderson@xxxxxxxxxx> · Wed, 4 Apr 2018 10:48:56 -0400 (EDT)

----- Original Message -----
> Hello,
> 
> I tried running crash-head (HEAD: 5d172b230cf4) against today's linus'
> master on a dump obtained via dump-guest-memory in qemu. And I got the
> following when the image is loaded:
> 
> please wait... (determining panic task)
> bt: read error: kernel virtual address: fffffe0000007000  type: "stack
> contents"
> 
>   KERNEL: vmlinux
>     DUMPFILE: memory-verbatim.img
>         CPUS: 1
>         DATE: Wed Apr  4 16:36:47 2018
>       UPTIME: 00:27:48
> LOAD AVERAGE: 31.11, 17.80, 10.43
>        TASKS: 145
>     NODENAME: ubuntu-virtual
>      RELEASE: 4.16.0-rc7-nbor
>      VERSION: #570 SMP Wed Apr 4 16:03:44 EEST 2018
>      MACHINE: x86_64  (3392 Mhz)
>       MEMORY: 4 GB
>        PANIC: ""
>          PID: 0
>      COMMAND: "swapper/0"
>         TASK: ffffffff82016500  [THREAD_INFO: ffffffff82016500]
>          CPU: 0
>        STATE: TASK_RUNNING
>      WARNING: panic task not found
> 
> crash> bt
> PID: 0      TASK: ffffffff82016500  CPU: 0   COMMAND: "swapper/0"
>  #0 [ffffffff82003dc8] __schedule at ffffffff817ea059
> bt: invalid RSP: ffffffff82003dc8  bt->stackbase/stacktop: ffffffff82000000/ffffffff82002000 cpu: 0
> 
> 
> So the kernel has been compiled with : gcc (Ubuntu
> 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 which has retpoline enabled.
> 
> I have KASLR disabled: # CONFIG_RANDOMIZE_BASE is not set and the kernel
> is compiled with CONFIG_FRAME_POINTER=y .
> 
> This scenario used to work around the 4.10 timeline. Am I doing
> something wrong or crash still needs time to work on the latest upstream
> kernel code?

Presumably the latter. 

If you do a "task -R stack ffffffff82016500", I'm presuming that it
shows the stack base address is ffffffff82000000.  And the looking at
the stackbase/stacktop values, the crash utility is presuming an 8K stack:

 bt: invalid RSP: ffffffff82003dc8  bt->stackbase/stacktop: ffffffff82000000/ffffffff82002000 cpu: 0

But the RSP is ffffffff82003dc8, which puts its beyond the 8K stack size, 
so I'm presuming that the kernel is actually using 16K stacks.  The most
recent kernel I have is 4.16.0-0.rc6.git3.1.fc29.x86_64, which uses 16K stacks.

Here is how the crash utility determines the stack size.  The x86_64 stacksize
starts out with a default size of 2 pages, as set here in x86_64_init(PRE_SYMTAB):

       case PRE_SYMTAB:
		... [ cut ] ...
                machdep->stacksize = machdep->pagesize * 2;
                ...

Then later on in task_init(), it gets resized as shown here, where 
the STACKSIZE() macro is machdep->stacksize:

        if (VALID_SIZE(task_union) && (SIZE(task_union) != STACKSIZE())) {
                error(WARNING, "\nnon-standard stack size: %ld\n",
                        len = SIZE(task_union));
                machdep->stacksize = len;
        } else if (VALID_SIZE(thread_union) &&
                ((len = SIZE(thread_union)) != STACKSIZE()))
                machdep->stacksize = len;

The "task_union" no longer exists, and so it checks whether the
"thread_union" is larger than the default stacksize, and resets the
size appropriately.  

On my 4.16.0-0.rc6.git3.1.fc29.x86_64 kernel, here is the thread_union:

  crash> thread_union
  union thread_union {
      struct task_struct task;
      unsigned long stack[2048];
  }
  SIZE: 16384

And so it gets reset:

  crash> help -m | grep stacksize
            stacksize: 16384
  crash>

You can debug it from there.  Let me know what you find.

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility