Re: About kexec issues in AWS nitro instances (RH bz 1758323)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, Mar 2, 2020 at 1:39 PM Dave Young <dyoung@xxxxxxxxxx> wrote:
>
> On 03/02/20 at 12:20am, Bhupesh Sharma wrote:
> > Hi Guilherme,
> >
> > On Sat, Feb 29, 2020 at 10:37 PM Guilherme G. Piccoli
> > <gpiccoli@xxxxxxxxxxxxx> wrote:
> > >
> > > Hi Bhupesh and Dave (and everybody CC'ed here), I'm Guilherme Piccoli
> > > and I'm working in the same issue observed in RH bugzilla 1758323 [0] -
> > > or at least, it seems to be the the same heh
> >
> > Ok.
> >
> > > The reported issue in my case was that the 2nd kexec fails on Nitro
> > > instanced, and indeed it's reproducible. More than this, it shows as an
> > > initrd corruption. I've found 2 workarounds, using the "new" kexec
> > > syscall (by doing kexec -s -l) and keep the initrd memory "un-freed",
> > > using the kernel parameter "retain_initrd".
> >
> > I have a couple of questions:
> > - How do you conclude that you see an initrd corruption across kexec?
> > Do you print the initial hex contents of initrd across kexec?
>
> I'm also interested if any of you can dump the initrd memory in kernel
> printk log, and then save to somewhere to compare with the original
> initrd content.

I did several overnight tests on the aws machine and can confirm kexec
reboot failure issue (multiple tries) can be seen even with
'retain_initrd' in the kernel bootargs or by using kexec_file_load
('kexec -s -l') instead of plain kexec_load ('kexec -l').

- Here are my observations:

1. Adding 'retain_initrd' to the bootargs, helps delay the kexec
reboot failure (when successive kexec reboots are executed), but the
(possible ?) initrd corruption is still seen (as per the panic logs
from the kexec kernel).

2. I printed the first 4M of initrd file via kernel code (both in the
primary and kexec kernel, see
<https://bugzilla.redhat.com/attachment.cgi?id=1667523> and
<https://bugzilla.redhat.com/attachment.cgi?id=1667521>) and
interestingly the first 4M contents are _exactly_ similar for primary
and kexec kernel, even though we see a (possible ?) initrd corruption.
See logs below from kexec kernel in case of panic:

[    4.229170] Call Trace:
[    4.234379]  dump_stack+0x5c/0x80
[    4.239840]  panic+0xe7/0x2a9
[    4.245291]  do_exit.cold.22+0x59/0x81
[    4.251025]  do_group_exit+0x3a/0xa0
[    4.256784]  __x64_sys_exit_group+0x14/0x20
[    4.262905]  do_syscall_64+0x5b/0x1a0
[    4.268537]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[    4.275784] RIP: 0033:0x7ff749106e2e
[    4.281469] Code: Bad RIP value.
[    4.286981] RSP: 002b:00007fffb6d707f8 EFLAGS: 00000206 ORIG_RAX:
00000000000000e7
[    4.298381] RAX: ffffffffffffffda RBX: 00007ff74910f528 RCX: 00007ff749106e2e
[    4.305616] RDX: 000000000000007f RSI: 000000000000003c RDI: 000000000000007f
[    4.313064] RBP: 00007ff749306000 R08: 00000000000000e7 R09: 00007fffb6d70708
[    4.320369] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
[    4.327671] R13: 0000000000000022 R14: 00007ff749306148 R15: 00007ff749306030
[    4.335396] Kernel Offset: 0x2a400000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    4.348002] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x00007f00
[    4.348002]  ]---
        2020-03-03T09:01:27+00:00

3. So the root-cause seems to be something else. I will do some more
debugging to evaluate the same.

4. I added two scripts (via
<https://bugzilla.redhat.com/attachment.cgi?id=1667561> and
<https://bugzilla.redhat.com/attachment.cgi?id=1667560>) which provide
an automated reproducer.

This reproducer can be run on the Host machine and launches repeated
kexec reboots on the aws machine.

Normally approx. 5-12 runs of the master script (i.e. kexec reboots)
can lead to a panic in the kexec kernel which indicates a (possible ?)
initrd corruption.

@Guilherme: Can you please help verify the observations on your setup
(both amazon and upstream kernel) using the automated test script?
Thanks.

Regards,
Bhupesh


_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec



[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux