On Thu, Jul 29, 2021, Oliver Upton wrote: > Refactor kvm_synchronize_tsc to make a new function that allows callers > to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly > for the sake of participating in TSC synchronization. > > This changes the locking semantics around TSC writes. "refactor" and "changes the locking semantics" are somewhat contradictory. The correct way to do this is to first change the locking semantics, then extract the helper you want. That makes review and archaeology easier, and isolates the locking change in case it isn't so safe after all. > Writes to the TSC will now take the pvclock gtod lock while holding the tsc > write lock, whereas before these locks were disjoint. > > Reviewed-by: David Matlack <dmatlack@xxxxxxxxxx> > Signed-off-by: Oliver Upton <oupton@xxxxxxxxxx> > --- > +/* > + * Infers attempts to synchronize the guest's tsc from host writes. Sets the > + * offset for the vcpu and tracks the TSC matching generation that the vcpu > + * participates in. > + * > + * Must hold kvm->arch.tsc_write_lock to call this function. Drop this blurb, lockdep assertions exist for a reason :-) > + */ > +static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, > + u64 ns, bool matched) > +{ > + struct kvm *kvm = vcpu->kvm; > + bool already_matched; Eww, not your code, but "matched" and "already_matched" are not helpful names, e.g. they don't provide a clue as to _what_ matched, and thus don't explain why there are two separate variables. And I would expect an "already" variant to come in from the caller, not the other way 'round. matched => freq_matched already_matched => gen_matched > + unsigned long flags; > + > + lockdep_assert_held(&kvm->arch.tsc_write_lock); > + > + already_matched = > + (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation); > + > + /* > + * We track the most recent recorded KHZ, write and time to > + * allow the matching interval to be extended at each write. > + */ > + kvm->arch.last_tsc_nsec = ns; > + kvm->arch.last_tsc_write = tsc; > + kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; > + > + vcpu->arch.last_guest_tsc = tsc; > + > + /* Keep track of which generation this VCPU has synchronized to */ > + vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; > + vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; > + vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; > + > + kvm_vcpu_write_tsc_offset(vcpu, offset); > + > + spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); I believe this can be spin_lock(), since AFAICT the caller _must_ disable IRQs when taking tsc_write_lock, i.e. we know IRQs are disabled at this point. > + if (!matched) { > + /* > + * We split periods of matched TSC writes into generations. > + * For each generation, we track the original measured > + * nanosecond time, offset, and write, so if TSCs are in > + * sync, we can match exact offset, and if not, we can match > + * exact software computation in compute_guest_tsc() > + * > + * These values are tracked in kvm->arch.cur_xxx variables. > + */ > + kvm->arch.nr_vcpus_matched_tsc = 0; > + kvm->arch.cur_tsc_generation++; > + kvm->arch.cur_tsc_nsec = ns; > + kvm->arch.cur_tsc_write = tsc; > + kvm->arch.cur_tsc_offset = offset; IMO, adjusting kvm->arch.cur_tsc_* belongs outside of pvclock_gtod_sync_lock. Based on the existing code, it is protected by tsc_write_lock. I don't care about the extra work while holding pvclock_gtod_sync_lock, but it's very confusing to see code that reads variables outside of a lock, then take a lock and write those same variables without first rechecking. > + matched = false; What's the point of clearing "matched"? It's already false... > + } else if (!already_matched) { > + kvm->arch.nr_vcpus_matched_tsc++; > + } > + > + kvm_track_tsc_matching(vcpu); > + spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); > +} > +