On Tue, Feb 18, 2025 at 03:01:30PM -0800, mhkelley58@xxxxxxxxx wrote: > From: Michael Kelley <mhklinux@xxxxxxxxxxx> > > Gen 2 Hyper-V VMs boot via EFI and have a standard EFI framebuffer > device. When the kdump kernel runs in such a VM, loading the efifb > driver may hang because of accessing the framebuffer at the wrong > memory address. > > The scenario occurs when the hyperv_fb driver in the original kernel > moves the framebuffer to a different MMIO address because of conflicts > with an already-running efifb or simplefb driver. The hyperv_fb driver > then informs Hyper-V of the change, which is allowed by the Hyper-V FB > VMBus device protocol. However, when the kexec command loads the kdump > kernel into crash memory via the kexec_file_load() system call, the > system call doesn't know the framebuffer has moved, and it sets up the > kdump screen_info using the original framebuffer address. The transition > to the kdump kernel does not go through the Hyper-V host, so Hyper-V > does not reset the framebuffer address like it would do on a reboot. > When efifb tries to run, it accesses a non-existent framebuffer > address, which traps to the Hyper-V host. After many such accesses, > the Hyper-V host thinks the guest is being malicious, and throttles > the guest to the point that it runs very slowly or appears to have hung. > > When the kdump kernel is loaded into crash memory via the kexec_load() > system call, the problem does not occur. In this case, the kexec command > builds the screen_info table itself in user space from data returned > by the FBIOGET_FSCREENINFO ioctl against /dev/fb0, which gives it the > new framebuffer location. > > This problem was originally reported in 2020 [1], resulting in commit > 3cb73bc3fa2a ("hyperv_fb: Update screen_info after removing old > framebuffer"). This commit solved the problem by setting orig_video_isVGA > to 0, so the kdump kernel was unaware of the EFI framebuffer. The efifb > driver did not try to load, and no hang occurred. But in 2024, commit > c25a19afb81c ("fbdev/hyperv_fb: Do not clear global screen_info") > effectively reverted 3cb73bc3fa2a. Commit c25a19afb81c has no reference > to 3cb73bc3fa2a, so perhaps it was done without knowing the implications > that were reported with 3cb73bc3fa2a. In any case, as of commit > c25a19afb81c, the original problem came back again. > > Interestingly, the hyperv_drm driver does not have this problem because > it never moves the framebuffer. The difference is that the hyperv_drm > driver removes any conflicting framebuffers *before* allocating an MMIO > address, while the hyperv_fb drivers removes conflicting framebuffers > *after* allocating an MMIO address. With the "after" ordering, hyperv_fb > may encounter a conflict and move the framebuffer to a different MMIO > address. But the conflict is essentially bogus because it is removed > a few lines of code later. > > Rather than fix the problem with the approach from 2020 in commit > 3cb73bc3fa2a, instead slightly reorder the steps in hyperv_fb so > conflicting framebuffers are removed before allocating an MMIO address. > Then the default framebuffer MMIO address should always be available, and > there's never any confusion about which framebuffer address the kdump > kernel should use -- it's always the original address provided by > the Hyper-V host. This approach is already used by the hyperv_drm > driver, and is consistent with the usage guidelines at the head of > the module with the function aperture_remove_conflicting_devices(). > > This approach also solves a related minor problem when kexec_load() > is used to load the kdump kernel. With current code, unbinding and > rebinding the hyperv_fb driver could result in the framebuffer moving > back to the default framebuffer address, because on the rebind there > are no conflicts. If such a move is done after the kdump kernel is > loaded with the new framebuffer address, at kdump time it could again > have the wrong address. > > This problem and fix are described in terms of the kdump kernel, but > it can also occur with any kernel started via kexec. > > See extensive discussion of the problem and solution at [2]. > > [1] https://lore.kernel.org/linux-hyperv/20201014092429.1415040-1-kasong@xxxxxxxxxx/ > [2] https://lore.kernel.org/linux-hyperv/BLAPR10MB521793485093FDB448F7B2E5FDE92@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > Reported-by: Thomas Tai <thomas.tai@xxxxxxxxxx> > Fixes: c25a19afb81c ("fbdev/hyperv_fb: Do not clear global screen_info") > Signed-off-by: Michael Kelley <mhklinux@xxxxxxxxxxx> Applied to hyperv-fixes, thanks!