Mark Cc: Marc, Geoff On 04/10/2015 12:02 AM, Mark Rutland wrote: > On Thu, Apr 09, 2015 at 05:53:33AM +0100, AKASHI Takahiro wrote: >> Mark, >> >> On 04/08/2015 10:05 PM, Mark Rutland wrote: >>> On Thu, Apr 02, 2015 at 06:40:13AM +0100, AKASHI Takahiro wrote: >>>> The current kvm implementation keeps EL2 vector table installed even >>>> when the system is shut down. This prevents kexec from putting the system >>>> with kvm back into EL2 when starting a new kernel. >>>> >>>> This patch resolves this issue by calling a cpu tear-down function via >>>> reboot notifier, kvm_reboot_notify(), which is invoked by >>>> kernel_restart_prepare() in kernel_kexec(). >>>> While kvm has a generic hook, kvm_reboot(), we can't use it here because >>>> a cpu teardown function will not be invoked, under current implementation, >>>> if no guest vm has been created by kvm_create_vm(). >>>> Please note that kvm_usage_count is zero in this case. >>>> >>>> We'd better, in the future, implement cpu hotplug support and put the >>>> arch-specific initialization into kvm_arch_hardware_enable/disable(). >>>> This way, we would be able to revert this patch. >>> >>> Why can't we use kvm_arch_hardware_enable/disable() currently? >> >> IIUC, kvm will call kvm_arch_hardware_enable() iff a new guest is being >> created *and* cpus have not been initialized yet. kvm_usage_count==0 >> indicates this. Similarly, kvm will call kvm_arch_hardware_disable() whenever >> a guest is being terminated (i.e. kvm_usage_count != 0). >> Therefore if kvm_arch_hardware_enable/disable() also handle EL2 vector table >> initialization, we don't have to have any particular operations, as my patch >> does, for kexec case. >> (a long-term solution) >> >> Since arm64 doesn't implement kvm_arch_hardware_enable() (I don't know why), >> I'm trying to fix the problem by adding a minimum tear-down function, kvm_cpu_reset, >> and invoking it via a reboot hook. >> (an interim fix) > > What I don't understand is why we can't move the init and tear-down > functions into kvm_arch_hardware_enable/disable(). They seem to be for > precisely what you are implementing, with the only difference being the > time that they are called. I don't know, neither. I just followed the discussions between Marc and Geoff, and their conclusion. I guessed that *refactoring* might be more complicated than expected. FYI, I gave a quick try to kvm_arch_hardware_enable() approach by removing cpu_init_hyp_mode() from init_hyp_mode() and putting it into kvm_arch_hardware_enable(), and it seems to work, at least, in my environment: boot => start a kvm guest => kexec reboot => start a kvm guest > Either I'm missing something, or we can simply implement the existing > hooks. I assume I'm missing something. Marc, Geoff, any comments? >>>> +static struct notifier_block kvm_reboot_nb = { >>>> + .notifier_call = kvm_reboot_notify, >>>> + .next = NULL, >>>> + .priority = 0, /* FIXME */ >>> >>> It would be helpful for the comment to explain why this is wrong, and >>> what needs fixing. >> >> Thank for reminding me of this. >> >> *priority* enforces a calling order of registered hook functions. >> If some hook returns NOTIFY_STOP_MASK, subsequent hooks won't be called. >> (Nevertheless, reboot sequence will go ahead. See kernel_restart_prepare()/ >> notifier_call_chain().) >> >> So we should make sure that kvm_reboot_notify() be called >> 1) after any hook functions which may depend on kvm, and > > Which hooks depend on KVM? I think I answered this question below: >> But how can we guarantee this and determine a priority of kvm_reboot_notify()? >> Looking into all the occurrences of register_reboot_notifier(), >> 1) => nothing >> 2) => virt/kvm/kvm_main.c (priority: 0) >> 3) => drivers/cpufreq/s32416-cpufreq.c (priority: 0) >> drivers/cpufreq/s5pv210-cpufreq.c (priority: 0) >> >> So a priority higher than zero might be safe and better, but exactly what? >> Some hooks use "INT_MAX." Thanks, -Takahiro AKASHI >> 2) before any hook functions which kvm may depend on, and > > Which other hooks does KVM depend on? > >> 3) before any hook functions that may return NOTIFY_STOP_MASK > > I think this would be solved by using kvm_arch_hardware_enable/disable. > As far as I can tell, the VMs would be destroyed earlier (and hence KVM > disabled) before we got to the final teardown. > > Thanks, > Mark. >