> I still don't feel my questions have been well answered. Its really > not clear to me why, in order to allow the level-2 guest to use a vdso > that the answer is to export more data through the entire stack rather > then to make the kvmclock to be usable from the vsyscall. Thanks, this helps. A stable kvmclock is already usable from the vsyscall. It is however not yet usable _in the hypervisor_ as a way to provide another stable kvmclock to the nested guest; right now the only clocksource that a hypervisor can use to provide a stable kvmclock is the TSC. So, regarding the "why is it necessary" part. Even on a modern host with invariant TSC, kvmclock mediates between TSC and the guest and provides for example support for live migration, where the TSC frequency may be different between source and destination. If the L1 hypervisor could use the TSC to provide a stable kvmclock, there would be no need for kvmclock in the first place. The paravirtualized clock may well disappear in a few years since Skylake provides TSC scaling. However, I'm not that optimistic because people are complaining that I removed support for 2007 processors and it seems that I'll have to put it back. So, as more people use nested virtualization (and we have nested virt migration in the works, too), nested kvmclock becomes more important too. Regarding the "why is it best" part. Right now, the hypervisor makes a copy of the timekeeper information in order to prepare the stable kvmclock. This code is very much tied to the TSC. However, a snapshot of the timekeeper information is almost entirely the same thing that ktime_get_snapshot returns, so my suggestion to "untie" the hypervisor code from the TSC was to use ktime_get_snapshot instead. This way, the clocksource itself tells KVM whether it can be the base for a vsyscall-happy kvmclock (which means, it must be the TSC or a linear transformation of it). While I am very happy with how the KVM code comes out, it might certainly be not the best solution---I definitely need help from the clocksource maintainers here, not just approval! In particular, it doesn't help that a lot of code surrounding ktime_get_snapshot is unused, so that may have sent me off track. In particular, the return value of the new callback can be defined as "is it the TSC or a linear transformation of it". But that's as good a definition as "is it good for KVM" (i.e., not very good) without some documentation on the meaning of "cycles" in the struct returned by ktime_get_snapshot. Once I understand that, I hope I can provide a better explanation for the return value of the callback. Paolo > So far for a problem statement, all I've got is: > "However, when using nested virtualization you have > > L0: bare-metal hypervisor (uses TSC) > L1: nested hypervisor (uses kvmclock, can use vsyscall) > L2: nested guest > > and L2 cannot use vsyscall because it is not using the TSC." > > Which is a start but doesn't really make it clear why the proposed > solution is best/necessary. > > thanks > -john >