RE: [PATCH] x86/hyperv: Fix kexec panic/hang issues

Michael Kelley <mikelley@xxxxxxxxxxxxx> · Tue, 22 Dec 2020 03:36:17 +0000

From: Dexuan Cui <decui@xxxxxxxxxxxxx>  Sent: Monday, December 21, 2020 5:04 PM
> 
> > From: Michael Kelley
> > Sent: Monday, December 21, 2020 3:33 PM
> > From: Dexuan Cui
> > Sent: Sunday, December 20, 2020 2:30 PM
> > >
> > > Currently the kexec kernel can panic or hang due to 2 causes:
> > >
> > > 1) hv_cpu_die() is not called upon kexec, so the hypervisor corrupts the
> > > VP Assist Pages when the kexec kernel runs. We ever fixed the same issue
> >
> > Spurious word "ever".  And avoid first person "we".  Perhaps:
> >
> > The same issue is fixed for hibernation in commit ..... .  Now fix it for
> > kexec.
> 
> Thanks! Will use this in v2.
> 
> > > for hibernation in
> > > commit 421f090c819d ("x86/hyperv: Suspend/resume the VP assist page for
> > hibernation")
> > > and now let's fix it for kexec.
> >
> > Is the VP Assist Page getting cleared anywhere on the panic path?  We can
> 
> It's not.
> 
> > only do it for the CPU that runs panic(), but I don't think it is getting cleared
> > even for that CPU.   It is cleared only in hv_cpu_die(), and that's not called on
> > the panic path.
> 
> IMO kdump is different from the non-kdump kexec in that the kdump kernel
> runs without depending on the memory used by the first kernel, so it looks
> unnecessary to clear the first kernel's VP Assist Page (and the hypercallpage).
> According to my test, the second kernel can re-enble the VP Asist Page and
> the hypercall page using different GPAs, without disabling the old pages first.

Ah yes.  You are right.  The kdump kernel must be using a disjoint area of
physical memory,  so not clearing these per-CPU overlay pages shouldn't
put the kdump kernel at risk.

> Of course, in the future Hyper-V may require the guest to disable the pages first
> before trying to re-enabling them, so I agree we'd better clear the pages in the
> first kernell like this:
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 4638a52d8eae..8022f51c9c05 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -202,7 +202,7 @@ void clear_hv_tscchange_cb(void)
>  }
>  EXPORT_SYMBOL_GPL(clear_hv_tscchange_cb);
> 
> -static int hv_cpu_die(unsigned int cpu)
> +int hv_cpu_die(unsigned int cpu)
>  {
>         struct hv_reenlightenment_control re_ctrl;
>         unsigned int new_cpu;
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 30f76b966857..d090e781d216 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -76,6 +76,8 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
>  #if IS_ENABLED(CONFIG_HYPERV)
>  extern int hyperv_init_cpuhp;
> 
> +int hv_cpu_die(unsigned int cpu);
> +
>  extern void *hv_hypercall_pg;
>  extern void  __percpu  **hyperv_pcpu_input_arg;
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 43b54bef5448..e54f8262bfe0 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -156,6 +156,9 @@ static void hv_machine_crash_shutdown(struct pt_regs *regs)
>         if (hv_crash_handler)
>                 hv_crash_handler(regs);
> 
> +       /* Only call this on the faulting CPU. */
> +       hv_cpu_die(raw_smp_processor_id());
> +
>         /* The function calls crash_smp_send_stop(). */
>         native_machine_crash_shutdown(regs);

Since we don't *need* to do this, I think it might be less risky to just leave
the code "as is".   But I'm OK either way.

> 
> > > 2) hyperv_cleanup() is called too early. In the kexec path, the other CPUs
> > > are stopped in hv_machine_shutdown() -> native_machine_shutdown(), so
> > > between hv_kexec_handler() and native_machine_shutdown(), the other
> > CPUs
> > > can still try to access the hypercall page and cause panic. The workaround
> > > "hv_hypercall_pg = NULL;" in hyperv_cleanup() can't work reliably.
> >
> > I would note that the comment in hv_suspend() is also incorrect on this
> > point.  Setting hv_hypercall_pg to NULL does not cause subsequent
> > hypercalls to fail safely.  The fast hypercalls don't test for it, and even if they
> > did test like hv_do_hypercall(), the test just creates a race condition.
> 
> The comment in hv_suspend() should be correct because hv_suspend()
> is only called during hibernation from the syscore_ops path where only
> one CPU is active, e.g. for the suspend operation, it's called from
> state_store
>   hibernate
>     hibernation_snapshot
>       create_image
>         suspend_disable_secondary_cpus
>         syscore_suspend
>           hv_suspend
> 
> It's similar for the resume operation:
> resume_store
>   software_resume
>     load_image_and_restore
>       hibernation_restore
>         resume_target_kernel
>           hibernate_resume_nonboot_cpu_disable
>           syscore_suspend
>             hv_suspend

I agree the hv_suspend() code is correct.  I read the second sentence of
the comment as being a more general statement that hypercalls could be
cleanly stopped by setting hv_hypercall_pg to NULL, which isn't true.

> 
> > >  static void hv_machine_crash_shutdown(struct pt_regs *regs)
> > >  {
> > >  	if (hv_crash_handler)
> > >  		hv_crash_handler(regs);
> > > +
> > > +	/* The function calls crash_smp_send_stop(). */
> >
> > Actually, crash_smp_send_stop() or smp_send_stop() has already been
> > called earlier by panic(),
> 
> This is true only when the Hyper-V host supports the feature
> HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE. On an old Hyper-V host
> without the feature, ms_hyperv_init_platform() doesn't set
> crash_kexec_post_notifiers, so crash_kexec_post_notifiers keeps its
> initial value "false", and panic() calls smp_send_stop() *after*
> __crash_kexec() (which calls machine_crash_shutdown() ->
> hv_machine_crash_shutdown()).

OK, I see your point.

> 
> >  so there's already only a single CPU running at
> > this point.  crash_smp_send_stop() is called again in
> > native_machine_crash_shutdown(), but it has a flag to prevent it from
> > running again.
> >
> > >  	native_machine_crash_shutdown(regs);
> > > +
> > > +	/* Disable the hypercall page when there is only 1 active CPU. */
> > > +	hyperv_cleanup();
> >
> > Moving the call to hyperv_cleanup() in the panic path is OK, and it makes
> > the panic and kexec() paths more similar, but I don't think it is necessary.
> > As noted above, the other CPUs have already been stopped, so they shouldn't
> > be executing any hypercalls.
> 
> As I explained above, it's necessary for old Hyper-V hosts. :-)

Agreed.