Re: About kexec issues in AWS nitro instances (RH bz 1758323)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Guilherme,

On Mon, Mar 23, 2020 at 8:16 PM Guilherme G. Piccoli
<gpiccoli@xxxxxxxxxxxxx> wrote:
>
> On 22/03/2020 18:16, Bhupesh Sharma wrote:
> > Hello Guilherme,
> >
> > On Fri, Mar 20, 2020 at 9:10 PM Guilherme G. Piccoli
> > <gpiccoli@xxxxxxxxxxxxx> wrote:
> >
> > Thanks for writing again. I was caught up in trying several other
> > suggestions/code-snippets to further debug this.
> > I tried several combinations - turning iommu off, turning off swiotlb
> > in the kexec kernel and testing various combinations with
> > retain_initrd added to the kexec kernel's bootargs.
> >
> > But nothing seems to fix the nested repetitive kexec reboot attempts
> > on the aws t3 machines I have. It just becomes better on few instances
> > (i.e. the kexec reboots would survive around 10 nested repetitive
> > attempts), while on the other(s) the failure can be seen quite
> > frequently (approx ~3 kexec reboot attempts).
>
> Hi Bhupesh, thanks for the tests! Indeed, this problem is difficult to
> prevent with those parameters, and it's quite interesting to see how it
> may vary among instances.

Indeed.

> > [...]
> > This is a really good debug and resulting patch.
> > I ran almost ~60 kexec repetitive attempts last night and also
> > repeated the same today morning and
> > the issue seems to get fixed for me with upstream kernel 5.6.0-rc6+
> > with this patch.
> >
> > I am leaving a test running with RHEL kernel + this patch overnight
> > and will have more updates to share by tomorrow morning.
>
> Thanks a lot =)
> I couldn't fail to give due credit to my friend Gavin Shan for the great
> suggestion that resulted in the patch! Let me know your results with the
> patch Bhupesh, and your Tested-by on it is much appreciated.
>
>
> >
> >> Bhupesh, I've noticed that suddenly the Red Hat bugzilla got private -
> >
> > Oops. I will check.
> >
> >> is it okay to add me in CC list so I can see it?
> >
> > Sure. I tried doing it, but seems Bugzilla is not happy as it keeps
> > complaining that you are not registered on BZ,
> > I will try to find out internally how to get around the issue.
> >
>
> Great! If you need me to sign-up in Bugzilla, I can do it. Just let me
> know the steps and I'd be glad in doing that.

Yes, please. I checked internally. If you can sign-up for Bugzilla, I
can directly add you to the Cc field of the Bugzilla work-item.

> >> Thanks for all the collaboration, I hope the issue was figured and solved!
> >
> > Sure. Thanks a lot for your inputs and trying the suggestions I posted
> > on the Bugzilla ticket.
> > I will soon share an update with RHEL/Fedora kernel kexec tests with
> > this patch applied and also reply with a Tested-by for the upstream
> > patch in the relevant thread.
> >
> > Thanks,
> > Bhupesh
> >
>
> Thank you, I appreciate the tests and collaboration =)
> Cheers,

No problem. The good news is that two runs of approx. ~200 runs of
nested kexec reboots worked even with RHEL/Fedora + your patch on the
aws t3 instance for me.

So, this looks like a real good patch to have upstream. Thanks a lot
for sharing and working on it.

I will go ahead and add my Tested-by for the upstream patch as well.

Thanks for all your help,
Bhupesh


_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec



[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux