RE: hyper_bf soft lockup on Azure Gen2 VM when taking kdump or executing kexec

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Sent: Monday, February 10, 2025 3:57 PM
> 
> On Mon, 2025-02-10 at 21:35 +0000, Michael Kelley wrote:
> > From: thomas.tai@xxxxxxxxxx <thomas.tai@xxxxxxxxxx> Sent: Monday, February 10, 2025 7:08 AM
> > > <snip>
> > >
> > > > > Then the question is why the efifb driver doesn't work in the kdump
> > > > > kernel. Actually, it *does* work in many cases. I built the 6.13.0 kernel
> > > > > on the Oracle Linux 9.4 system, and transferred the kernel image binary
> > > > > and module binaries to an Ubuntu 20.04 VM in Azure. In that VM, the
> > > > > efifb driver is loaded as part of the kdump kernel, and it doesn't cause
> > > > > any problems. But there's an interesting difference. In the Oracle Linux
> > > > > 9.4 VM, the efifb driver finds the framebuffer at 0x40000000, while on
> > > > > the Ubuntu 20.04 VM, it finds the framebuffer at 0x40900000. This
> > > > > difference is due to differences in how the screen_info variable gets
> > > > > setup in the two VMs.
> > > > >
> > > > > When the normal kernel starts in a freshly booted VM, Hyper-V provides
> > > > > the EFI framebuffer at 0x40000000, and it works. But after the Hyper-V
> > > > > FB driver or Hyper-V DRM driver has initialized, Linux has picked a
> > > > > different MMIO address range and told Hyper-V to use the new
> > > > > address range (which often starts at 0x40900000). A kexec does *not*
> > > > > reset Hyper-V's transition to the new range, so when the efifb driver
> > > > > tries to use the framebuffer at 0x40000000, the accesses trap to
> > > > > Hyper-V and probably fail or timeout (I'm not sure of the details). After
> > > > > the guest does some number of these bad references, Hyper-V considers
> > > > > itself to be under attack from an ill-behaving guest, and throttles the
> > > > > guest so that it doesn't run for a few seconds. The throttling repeats,
> > > > > and results in extremely slow running in the kdump kernel.
> > > > >
> > > > > Somehow in the Ubuntu 20.04 VM, the location of the frame buffer
> > > > > as stored in screen_info.lfb_base gets updated to be 0x40900000. I
> > > > > haven't fully debugged how that happens. But with that update, the
> > > > > efifb driver is using the updated framebuffer address and it works. On
> > > > > the Oracle Linux 9.4 system, that update doesn't appear to happen,
> > > > > and the problem occurs.
> > > > >
> > > > > This in an interim update on the problem. I'm still investigating how
> > > > > screen_info.lfb_base is set in the kdump kernel, and why it is different
> > > > > in the Ubuntu 20.04 VM vs. in the Oracle Linux 9.4 VM. Once that is
> > > > > well understood, we can contemplate how to fix the problem. Undoing
> > > > > the revert that is commit 2bebc3cd4870 doesn't seem like the solution
> > > > > since the original code there was reported to cause many other issues.
> > > > > The solution focus will likely be on how to ensure the kdump kernel gets
> > > > > the correct framebuffer address so the efifb driver works, since the
> > > > > framebuffer address changing is a quirk of Hyper-V behavior.
> > > > >
> > > > > If anyone else has insight into what's going on here, please chime in.
> > > > > What I've learned so far is still somewhat tentative.
> > > > >
> > > > Here's what is happening. On Ubuntu 20.04, the kdump image is
> > > > loaded into crash memory using the kexec command. Ubuntu 20.04
> > > > has kexec from the kexec-tools package version 2.0.18-1ubuntu1.1,
> > > > and per the kexec man page, it defaults to using the older kexec_load()
> > > > system call. When using kexec_load(), the contents to be loaded into
> > > > crash memory is constructed in user space by the kexec command.
> > > > The kexec command gets the "screen_info" settings, including the
> > > > physical address of the frame buffer, via the FBIOGET_FSCREENINFO
> > > > ioctl against /dev/fb0. The Hyper-V FB or DRM driver registers itself
> > > > with the fbdev subsystem so that it is /dev/fb0, and the ioctl returns
> > > > the updated framebuffer address. So the efifb driver loads and runs
> > > > correctly.
> > > >
> > > > On Oracle Linux 9.4, the kdump image is also loaded with the
> > > > kexec command, but from kexec-tools package version
> > > > kexec-tools-2.0.28-1.0.10.el9_5.x86_64, which is slightly later than
> > > > the version on Ubuntu 20.04. This newer kexec defaults to using the
> > > > newer kexec_file_load() system call. This system call gets the
> > > > framebuffer address from the screen_info variable in the kernel, which
> > > > has not been updated to reflect the new framebuffer address. Hence
> > > > in the kdump kernel, the efifb driver uses the old framebuffer address,
> > > > and hence the problem.
> > > >
> > > > To further complicate matters, the kexec on Oracle Linux 9.4 seems to
> > > > have a bug when the -c option forces the use of kexec_load() instead
> > > > of kexec_file_load(). As an experiment, I modified the kdumpctl shell
> > > > script to add the "-c" option to kexec, but in that case the value "0x0"
> > > > is passed as the framebuffer address, which is wrong. Furthermore,
> > > > the " screen_info.orig_video_isVGA" value (which I mentioned earlier
> > > > in connection with commit 2bebc3cd4870) is also set to 0, so the
> > > > kdump kernel no longer thinks it has an EFI framebuffer. Hence the
> > > > efifb driver isn't loaded, and the kdump works, though for the wrong
> > > > reasons. If kexec 2.0.18 from Ubuntu is copied onto the Oracle Linux 9.4
> > > > VM, then kdump works as expected, with the efifb driver being loaded
> > > > and using the correct framebuffer address. So something is going wrong
> > > > with kexec 2.0.28 in how it sets up the screen_info when the -c option
> > > > is used. I'll leave the debugging of the kexec bug to someone else.
> > >
> > > Hi Michael,
> > >
> > > Do you think we need to handle Azure Gen2 VM differently in the kexec?
> > >
> > > Or should we change the kexec_file_load() system call to retrieve the correct
> > > framebuffer address?
> >
> > I'm thinking there may be a fix in the Hyper-V FB and Hyper-V DRM drivers.
> > Commit c25a19afb81c may also be a cause of the problem -- see precursor
> > commit 3cb73bc3fa2a, which describes exactly the problem. I still need to
> > do some testing, but without that commit, kdump won't detect that it has
> > an EFI framebuffer, won't load the efifb driver, and so won't encounter the
> > problem. But we probably need to get Thomas Zimmerman to weigh in on
> > the implications of reverting c25a19afb81c.
> >
> > There's one additional variation of the problem. Assume the Hyper-V FB
> > driver is loaded (for example) during boot and moves the framebuffer. Then
> > system runs kexec as part of arming kdump during the boot sequence.
> > The most recent location of the framebuffer (and whether it is an EFI framebuffer)
> > gets picked at the time kexec runs, and is stored in the crash kernel memory area.
> > But what if the framebuffer later moves, perhaps because the Hyper-V FB driver
> > is unbound? The crash kernel memory area doesn’t get updated and kdump
> > could still have the wrong framebuffer address. This anomaly argues for the
> > commit 3cb73bc3fa2a approach of just ensuring that the efifb driver doesn't
> > load. Of course that approach means that the kdump kernel *must* contain
> > either the Hyper-V FB or Hyper-V DRM driver in order to work on a system
> > with only a framebuffer for text output. The efifb driver won't work. But
> > perhaps that's OK.
> >
> > Changing kexec (or the invoking script) to special case Hyper-V Gen 2 VMs and
> > always use kexec_load() instead of kexec_file_load() sounds like a big hack
> > to me.  And with that approach, you give up the ability to enforce loading only
> > properly signed kdump images. This is something kexec_file_load() provides
> > that kexec_load() doesn't, and is one of the main reasons that kexec_file_load()
> > was added.
> >
> > Whether the kexec_file_load() system call could be enhanced to get the
> > frame buffer information from the /dev/fb0 device, I'm not sure. That might
> > be a reasonable approach, though it still has the problem that the framebuffer
> > address could change *after* kexec_file_load() runs.
> >
> > Anyway, that's a dump of my current thoughts. I haven't reached a final
> > conclusion or recommendation yet. Comments from others on the
> > thread are welcome.
> 
> Hi!
> 
> Asking because I also had to do some digging in this area:
> 
> Do you think that the kernel can *ask* the hypervisor where the framebuffer is instead
> of relying on bios, the bootloader and/or kexec to somehow provide this information?
> 
> If hyperv doesn't provide this API, how hard it would be in your opinion to provide it?
> 
> I am asking because, I also had to debug a RHEL downstream issue where a slightly
> botched backport
> ensured that the first stage of the compressed uefi boot image, stopped passing the
> 'screen_info'
> to the second stage (the kernel itself), and as a result of this, the second stage stopped
> loading simplefb, and as a result of *this*, the PCI driver started to try to use the
> framebuffer
> range for its own use which failed and resulted in a cryptic error.
> 
> If the kernel was to just issue some form of a hypercall to ask the hypervisor where the
> framebuffer currently is,
> we could avoid a whole class of bugs similar to this.
> What do you think?
> 

I'm not aware of a way to ask Hyper-V about the framebuffer location.
I had not previously thought about such a possibility, so it's worth
thinking through. Here's how I see it: The issue is with generic drivers like
efifb (and others) that are hardcoded to read screen_info.lfb_base to
find the framebuffer. So the proposed new hypercall would need to be
made relatively early during boot, and it would update screen_info.lfb_base
to reflect the current location of the framebuffer. Hypercalls can only be
made after the setup in hyperv_init() is done. Fortunately, that's probably
before any framebuffer driver would read screen_info.lfb_base, though I'm
not completely sure.

Another factor is that the Hyper-V framebuffer is provided by the QEMU
equivalent that's embedded in the overall Hyper-V host, and not by
the hypervisor itself. The framebuffer is a VMBus device. So the Hyper-V
people would probably want getting the framebuffer location to be a
VMBus message to the framebuffer device, not a hypercall. And the VMBus
machinery isn't setup up until later -- too late, in fact, to change
screen_info.lfb_base before some generic driver reads it. So that's likely
to be a problem with the idea, though I'm speculating on what the 
Hyper-V folks would say.

The last factor is getting Hyper-V to add the feature. Somebody on the
Microsoft side would need to carry that request to the Hyper-V team.
I'm former Microsoft, but retired 1+ years ago, so I'm now just an unpaid
hobbyist contributing to the kernel because I enjoy the challenge. :-) But
I no longer have the Microsoft insider connection to the Hyper-V team.

[Index of Archives]     [Video for Linux]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Tourism]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux