[I'm responding on top of my last message, fully quoting it below because it was moderated and didn't get published in the mailing-list, for some reason. If any moderator can make it public, I appreciate!] Hi Bhupesh, I re-tested again using 5.6-rc4 with "retain_initrd" and "swiotlb=noforce" and got a quite interesting discrepancy. First run got me 99 kexecs with no issue (the public IP of my AWS instance was 3.215.x.y). After this, I powered the instance off and some minutes later, restarted it (and the new IP was 34.239.x.y) - guess what? It failed after 6th kexec iteration with an oops, which I was able to collect [0] using pstore. So, I'm inclined to think when I restarted the instance (and it got a different IP, on a different range), likely it got deployed in a different host, which explain some differences we are observing across tests. I collected DMI data on both but it didn't show me any difference - it is though feasible to hide host details from guest (almost?) completely, so this should be a question to AWS. Finally, I forgot to mention you in the previous email: you asked me about testing kexec-tools with commit [1], and I tried also, but it didn't help, specially because it affects the "/proc/iomem" memory read path, but kexec-tools uses get_memory_ranges_sysfs() by default, which reads from firmware memmap. In the past I tried to force kexec-tools to read from /proc/iomem, but it didn't help the issue. Now I just tried again forcing the usage of get_memory_ranges_proc_iomem() with patch merged [1], but same issue reproduces (failure on 2nd kexec with initrd corruption). Cheers, Guilherme [0] https://pastebin.ubuntu.com/p/fS6c3sPMgk/ [1] http://lists.infradead.org/pipermail/kexec/2020-February/024531.html On 04/03/2020 16:22, Guilherme G. Piccoli wrote: > On 04/03/2020 15:39, Bhupesh Sharma wrote: >> Hi, > > Hi Bhupesh, thanks for your prompt and thorough response! > I manage to do some tests myself, based on your last email, and will > share my result inline below: > > >> >> On Mon, Mar 2, 2020 at 1:39 PM Dave Young <dyoung@xxxxxxxxxx> wrote: >>>> >>>> I have a couple of questions: >>>> - How do you conclude that you see an initrd corruption across kexec? >>>> Do you print the initial hex contents of initrd across kexec? >>> >>> I'm also interested if any of you can dump the initrd memory in kernel >>> printk log, and then save to somewhere to compare with the original >>> initrd content. > > I didn't print yet Dave, but seems Bhupesh did and the 1st 4M are the > same right? The way the issue shows to me is an oops on the 2nd kexec > (in other words, the 1st kexec from a kexec'ed kernel!), with the > following message: > > "Initramfs unpacking failed: junk in compressed archive" > > Also, I've added debug code on kernel initramfs routines to trace-printk > file-by-file as they got decompressed; then, by doing > "ftrace_dump_on_oops" I could check the list of files and it's really > partial (the biggest part of the files are not decompressed). > It fails usually in this if, on flush_buffer() [init/initramfs.c]: > > if (c == '0') > [...] > else if (c == 0) > [...] > else > [junk] > > A print of 'c' variable in this point shows its value as 6. > I'm attaching here a dmesg (collected through pstore/ramoops) so you can > take a look. > > >> >> I did several overnight tests on the aws machine and can confirm kexec >> reboot failure issue (multiple tries) can be seen even with >> 'retain_initrd' in the kernel bootargs or by using kexec_file_load >> ('kexec -s -l') instead of plain kexec_load ('kexec -l'). >> > > I managed to test multiple kexecs in an automated way (using a crontab > plus a script with a counter in my AWS instance) and you are right, > after some kexecs it fails. My test survived for 70 kexecs, and it > failed in the end by not jumping into the new kernel / failing really > early and getting stuck on "kexec_core: Starting new kernel", as you said. > > This seems to be a different manifestation of the issue, we seem to > prevent the usual effect of initrd "corruption" by using the > "retain_initrd" parameter. > > Also, when I added both "retain_initrd" and "swiotlb=noforce" to > command-line, the test failed after 10 iterations in a different way - > it crashed and rebooted to regular kernel (as I have "oops=panic" and > "panic=1" in my cmdline), but pstore wasn't enabled in that test, so > didn't collect that information (I plan to re-test that). > > >> - Here are my observations: >> >> 1. Adding 'retain_initrd' to the bootargs, helps delay the kexec >> reboot failure (when successive kexec reboots are executed), but the >> (possible ?) initrd corruption is still seen (as per the panic logs >> from the kexec kernel). >> >> 2. I printed the first 4M of initrd file via kernel code (both in the >> primary and kexec kernel, see >> <https://bugzilla.redhat.com/attachment.cgi?id=1667523> and >> <https://bugzilla.redhat.com/attachment.cgi?id=1667521>) and >> interestingly the first 4M contents are _exactly_ similar for primary >> and kexec kernel, even though we see a (possible ?) initrd corruption. >> See logs below from kexec kernel in case of panic: >> >> [ 4.229170] Call Trace: >> [ 4.234379] dump_stack+0x5c/0x80 >> [ 4.239840] panic+0xe7/0x2a9 >> [ 4.245291] do_exit.cold.22+0x59/0x81 >> [ 4.251025] do_group_exit+0x3a/0xa0 >> [ 4.256784] __x64_sys_exit_group+0x14/0x20 >> [ 4.262905] do_syscall_64+0x5b/0x1a0 >> [ 4.268537] entry_SYSCALL_64_after_hwframe+0x65/0xca >> [ 4.275784] RIP: 0033:0x7ff749106e2e >> [ 4.281469] Code: Bad RIP value. >> [ 4.286981] RSP: 002b:00007fffb6d707f8 EFLAGS: 00000206 ORIG_RAX: >> 00000000000000e7 >> [ 4.298381] RAX: ffffffffffffffda RBX: 00007ff74910f528 RCX: 00007ff749106e2e >> [ 4.305616] RDX: 000000000000007f RSI: 000000000000003c RDI: 000000000000007f >> [ 4.313064] RBP: 00007ff749306000 R08: 00000000000000e7 R09: 00007fffb6d70708 >> [ 4.320369] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 >> [ 4.327671] R13: 0000000000000022 R14: 00007ff749306148 R15: 00007ff749306030 >> [ 4.335396] Kernel Offset: 0x2a400000 from 0xffffffff81000000 >> (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> [ 4.348002] ---[ end Kernel panic - not syncing: Attempted to kill >> init! exitcode=0x00007f00 >> [ 4.348002] ]--- >> 2020-03-03T09:01:27+00:00 >> > > This is really interesting! If you could share the code you used to dump > the initrd, I can try in my mainline build with Ubuntu config and dump > the whole initrd to check if it's the same on regular and kexec'ed > kernels. I was planning to work on something like this after Dave's > suggestion... > > Also, my oops splat is different from yours (as you can check in the > attached dmesg); it really seems the initrd "corruption" is just one > potential side-effect of this issue, you're observing a different failure. > > >> 3. So the root-cause seems to be something else. I will do some more >> debugging to evaluate the same. > > Agreed! I'll debug from here too. I'm considering an instrumentation on > the shutdown path and add "retain_initrd" to see if I can reproduce that > hang (on "Starting new kernel") and collect more information - the > difficult part is that when that issue occur, I can't access console via > AWS interface and pstore won't work in this shutdown hang, since it's > not an oops event heheh > > >> >> 4. I added two scripts (via >> <https://bugzilla.redhat.com/attachment.cgi?id=1667561> and >> <https://bugzilla.redhat.com/attachment.cgi?id=1667560>) which provide >> an automated reproducer. >> >> This reproducer can be run on the Host machine and launches repeated >> kexec reboots on the aws machine. >> >> Normally approx. 5-12 runs of the master script (i.e. kexec reboots) >> can lead to a panic in the kexec kernel which indicates a (possible ?) >> initrd corruption. >> >> @Guilherme: Can you please help verify the observations on your setup >> (both amazon and upstream kernel) using the automated test script? > > Thanks for sharing the script! I guess my approach with croontab already > allowed me to verify your observations, right? > > Now, what about the "swiotlb=noforce", does it still work for you as a > workaround for this issue? Do you mind in sharing your .config with me, > so I can try with your exact config to see if instead initrd > "corruption", I'm presented with the same exact signature you got? > > Thanks again, I appreciate a lot your collaboration =) > Cheers, > > > Guilherme > > > >> Thanks. >> >> Regards, >> Bhupesh >> _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec