Re: [PATCH v8 003/103] KVM: Refactor CPU compatibility check on module initialization

Isaku Yamahata <isaku.yamahata@xxxxxxxxx> · Mon, 22 Aug 2022 22:27:11 -0700

On Fri, Aug 12, 2022 at 11:35:29AM +0000,
"Huang, Kai" <kai.huang@xxxxxxxxx> wrote:

> >   3.  Provide arch hooks that are invoked for "power management" operations (including
> >       CPU hotplug and host reboot, hence the quotes).  Note, there's both a platform-
> >       wide PM notifier and a per-CPU notifier...
> > 
> >   4.  Rename kvm_arch_post_init_vm() to e.g. kvm_arch_add_vm(), call it under
> >       kvm_lock, and pass in kvm_usage_count.
> > 
> >   5a. Drop cpus_hardware_enabled and drop the common hardware enable/disable code.
> > 
> >  or 
> > 
> >   5b. Expose kvm_hardware_enable_all() and/or kvm_hardware_enable() so that archs
> >       don't need to implement their own error handling and per-CPU flags.
> > 
> > I.e. give each architecture hooks to handle possible transition points, but otherwise
> > let arch code decide when and how to do hardware enabling/disabling. 
> > 
> > I'm very tempted to vote for (5a); x86 is the only architecture has an error path
> > in kvm_arch_hardware_enable(), and trying to get common code to play nice with arm's
> > kvm_arm_hardware_enabled logic is probably going to be weird.
> > 

I ended up with (5a) with the following RFC patches.
https://lore.kernel.org/kvm/cover.1660974106.git.isaku.yamahata@xxxxxxxxx/T/#m0239e7800b66174b49c5b1049462aad50293a994

> > E.g. if we can get the back half kvm_create_vm() to look like the below, then arch
> > code can enable hardware during kvm_arch_add_vm() if the existing count is zero
> > without generic KVM needing to worry about when hardware needs to be enabled and
> > disabled.
> > 
> > 	r = kvm_arch_init_vm(kvm, type);
> > 	if (r)
> > 		goto out_err_no_arch_destroy_vm;
> > 
> > 	r = kvm_init_mmu_notifier(kvm);
> > 	if (r)
> > 		goto out_err_no_mmu_notifier;
> > 
> > 	/*
> > 	 * When the fd passed to this ioctl() is opened it pins the module,
> > 	 * but try_module_get() also prevents getting a reference if the module
> > 	 * is in MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait").
> > 	 */
> > 	if (!try_module_get(kvm_chardev_ops.owner)) {
> > 		r = -ENODEV;
> > 		goto out_err;
> > 	}
> > 
> > 	mutex_lock(&kvm_lock);
> > 	cpus_read_lock();
> > 	r = kvm_arch_add_vm(kvm, kvm_usage_count);
> 
> Holding cpus_read_lock() here implies CPU hotplug cannot happen during
> kvm_arch_add_vm().  This needs a justification/comment to explain why.  
> 
> Also, assuming we have a justification, since (based on your description above)
> arch _may_ choose to enable hardware within it, but it is not a _must_.  So
> maybe remove cpus_read_lock() here and let kvm_arch_add_vm() to decide whether
> to use it?

For now, I put locking outside of kvm_arch_{add, del}_vm().  But I haven't looked
at non-x86 arch.  Probably arm code has its preference for its implementation.
-- 
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>