Re: [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Wed, 3 Oct 2018 16:00:29 -0300

On Tue, Oct 02, 2018 at 10:15:49PM -0700, Andy Lutomirski wrote:
> Hi Vitaly, Paolo, Radim, etc.,
> 
> On Fri, Sep 14, 2018 at 5:52 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >
> > Matt attempted to add CLOCK_TAI support to the VDSO clock_gettime()
> > implementation, which extended the clockid switch case and added yet
> > another slightly different copy of the same code.
> >
> > Especially the extended switch case is problematic as the compiler tends to
> > generate a jump table which then requires to use retpolines. If jump tables
> > are disabled it adds yet another conditional to the existing maze.
> >
> > This series takes a different approach by consolidating the almost
> > identical functions into one implementation for high resolution clocks and
> > one for the coarse grained clock ids by storing the base data for each
> > clock id in an array which is indexed by the clock id.
> >
> 
> I was trying to understand more of the implications of this patch
> series, and I was again reminded that there is an entire extra copy of
> the vclock reading code in arch/x86/kvm/x86.c.  And the purpose of
> that code is very, very opaque.
> 
> Can one of you explain what the code is even doing?  From a couple of
> attempts to read through it, it's a whole bunch of
> probably-extremely-buggy code that, 

Yes, probably.

> drumroll please, tries to atomically read the TSC value and the time.  And decide whether the
> result is "based on the TSC".  

I think "based on the TSC" refers to whether TSC clocksource is being
used.

> And then synthesizes a TSC-to-ns
> multiplier and shift, based on *something other than the actual
> multiply and shift used*.
> 
> IOW, unless I'm totally misunderstanding it, the code digs into the
> private arch clocksource data intended for the vDSO, uses a poorly
> maintained copy of the vDSO code to read the time (instead of doing
> the sane thing and using the kernel interfaces for this), and
> propagates a totally made up copy to the guest.

I posted kernel interfaces for this, and it was suggested to 
instead write a "in-kernel user of pvclock data".

If you can get kernel interfaces to replace that, go for it. I prefer
kernel interfaces as well.

>  And gets it entirely
> wrong when doing nested virt, since, unless there's some secret in
> this maze, it doesn't acutlaly use the scaling factor from the host
> when it tells the guest what to do.
> 
> I am really, seriously tempted to send a patch to simply delete all
> this code.  

If your patch which deletes the code gets the necessary features right,
sure, go for it.

> The correct way to do it is to hook

Can you expand on the correct way to do it?

> And I don't see how it's even possible to pass kvmclock correctly to
> the L2 guest when L0 is hyperv.  KVM could pass *hyperv's* clock, but
> L1 isn't notified when the data structure changes, so how the heck is
> it supposed to update the kvmclock structure?

I don't parse your question.

> 
> --Andy
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel