On Fri, 04 Jun 2021 10:20:28 +0100, Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> wrote: > > On (21/06/04 09:46), Marc Zyngier wrote: > [..] > > > +void kvm_arch_pm_notifier(struct kvm *kvm) > > > +{ > > > +#ifdef CONFIG_PM > > > + int c; > > > + > > > + mutex_lock(&kvm->lock); > > > + for (c = 0; c < kvm->created_vcpus; c++) { > > > + struct kvm_vcpu *vcpu = kvm->vcpus[c]; > > > + int r; > > > + > > > + if (!vcpu) > > > + continue; > > > > Wouldn't kvm_for_each_vcpu() avoid this kind of checks? > > Right, that's what I do in v2, which I haven't posted yet. > > [..] > > > +#include <linux/notifier.h> > > > + > > > #ifndef KVM_MAX_VCPU_ID > > > #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS > > > #endif > > > @@ -579,6 +581,10 @@ struct kvm { > > > pid_t userspace_pid; > > > unsigned int max_halt_poll_ns; > > > u32 dirty_ring_size; > > > + > > > +#ifdef CONFIG_PM > > > + struct notifier_block pm_notifier; > > > +#endif > > > > I'd certainly like to be able to opt out from this on architectures > > that do not implement anything useful in the PM callbacks. > > Well on the other hand PM-callbacks are harmless on those archs, they > won't overload the __weak function. I don't care much for the callbacks. But struct kvm is bloated enough, and I'd be happy not to have this structure embedded in it if I can avoid it. > > > Please consider making this an independent config option that individual > > archs can buy into. > > Sure, I can take a look into this, but how is this better than __weak > function? (that's a real question) Weak functions are OK AFAIC. More crud in struct kvm is what I'm not OK with. > > [..] > > > +#ifdef CONFIG_PM > > > +static int kvm_pm_notifier_call(struct notifier_block *bl, > > > + unsigned long state, > > > + void *unused) > > > +{ > > > + struct kvm *kvm = container_of(bl, struct kvm, pm_notifier); > > > + > > > + switch (state) { > > > + case PM_HIBERNATION_PREPARE: > > > + case PM_SUSPEND_PREPARE: > > > + kvm_arch_pm_notifier(kvm); > > > > How about passing the state to the notifier callback? I'd expect it to > > be useful to do something on resume too. > > For different states we can have different kvm_arch functions instead. > kvm_arch_pm_notifier() can be renamed to kvm_arch_suspend_notifier(), > so that we don't need to have `switch (state)` in every arch-code. Then > for resume/post resume states we can have kvm_arch_resume_notifier() > arch functions. I'd rather we keep an arch API that is similar to the one the rest of the kernel has, instead of a flurry of small helpers that need to grow each time someone adds a new PM state. A switch() in the arch-specific implementation is absolutely fine. > > > > + break; > > > + } > > > + return NOTIFY_DONE; > > > +} > > > + > > > +static void kvm_init_pm_notifier(struct kvm *kvm) > > > +{ > > > + kvm->pm_notifier.notifier_call = kvm_pm_notifier_call; > > > + kvm->pm_notifier.priority = INT_MAX; > > > > How is this priority determined? > > Nothing magical here. I want this to be executed first, before we suspend > ftrace, RCU and the like. Besides KVM is usually the last one to register > its PM callbacks, so there can be something on the notifier list that > returns NOTIFY_STOP_MASK in front of KVM PM-notifier list entry. Which begs the question: should arch-specific code be able to veto suspend and return an error itself? Always returning NOTIFY_DONE seems highly optimistic. M. -- Without deviation from the norm, progress is not possible.