On Thu, 18 Jul 2019, Dexuan Cui wrote: > > On Thu, 4 Jul 2019, Dexuan Cui wrote: > > This is the allocation when the CPU is brought online for the first > > time. So what effect has zeroing at allocation time vs. offlining and > > potentially receiving IPIs? That allocation is never freed. > > > > Neither the comment nor the changelog make any sense to me. > > tglx > > That allocation was introduced by the commit > a46d15cc1ae5 ("x86/hyper-v: allocate and use Virtual Processor Assist Pages"). > > I think it's ok to not free the page when a CPU is offlined: every > CPU uses only 1 page and CPU offlining is not really a very usual > operation except for the scenario of hibernation (and suspend-to-memory), > where the CPUs are quickly onlined again, when we resume from hibernation. > IMO Vitaly intentionally decided to not free the page for simplicity of the > code. > > When a CPU (e.g. CPU1) is being onlined, in hv_cpu_init(), we allocate the > VP_ASSIST_PAGE page and enable the PV EOI optimization for this CPU by > writing the MSR HV_X64_MSR_VP_ASSIST_PAGE. From now on, this CPU > *always* uses hvp->apic_assist (which is updated by the hypervisor) to > decide if it needs to write the EOI MSR: > > static void hv_apic_eoi_write(u32 reg, u32 val) > { > struct hv_vp_assist_page *hvp = hv_vp_assist_page[smp_processor_id()]; > > if (hvp && (xchg(&hvp->apic_assist, 0) & 0x1)) > return; > > wrmsr(HV_X64_MSR_EOI, val, 0); > } > > When a CPU (e.g. CPU1) is being offlined, on this CPU, we do: > 1. in hv_cpu_die(), we disable the PV EOI optimizaton for this CPU; > 2. we finish the remaining work of stopping this CPU; > 3. this CPU is completed stopped. > > Between 1 and 3, this CPU can still receive interrupts (e.g. IPIs from CPU0, > and Local APIC timer interrupts), and this CPU *must* write the EOI MSR for > every interrupt received, otherwise the hypervisor may not deliver further > interrupts, which may be needed to stop this CPU completely. > > So we need to make sure hvp->apic_assist.bit0 is zero, after we run the line > "wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);" in hv_cpu_die(). The easiest > way is what I do in this patch. Alternatively, we can use the below patch: > > @@ -188,8 +188,12 @@ static int hv_cpu_die(unsigned int cpu) > local_irq_restore(flags); > free_page((unsigned long)input_pg); > > - if (hv_vp_assist_page && hv_vp_assist_page[cpu]) > + if (hv_vp_assist_page && hv_vp_assist_page[cpu]) { > + local_irq_save(flags); > wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); > + hvp->apic_assist &= ~1; > + local_irq_restore(flags); > + } > > if (hv_reenlightenment_cb == NULL) > return 0; > > This second version needs 3+ lines, so I prefer the one-line version. :-) Those are two different things. The GPF_ZERO allocation makes sense on it's own but it _cannot_ prevent the following scenario: cpu_init() if (!hvp) hvp = vmalloc(...., GFP_ZERO); ... hvp->apic_assist |= 1; #1 cpu_die() if (....) wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); ---> IPI if (!(hvp->apic_assist & 1)) wrmsr(APIC_EOI); <- PATH not taken #3 cpu is dead cpu_init() if (!hvp) hvp = vmalloc(....m, GFP_ZERO); <- NOT TAKEN because hvp != NULL So you have to come up with a better fairy tale why GFP_ZERO 'fixes' this. Allocating hvp with GFP_ZERO makes sense on it's own so the allocated memory has a defined state, but that's a different story. The 3 liner patch above makes way more sense and you can spare the local_irq_save/restore by moving the whole condition into the irq_save/restore region above. Thanks, tglx