Thomas Gleixner wrote: > I tend to disagree. The clockevents infrastructure was designed to cope > with the existing mess of real hardware. The discussion over the last > days exposed me to even more exotic designs than the hardware vendors > were able to deliver until now. > It's a different but related problem domain. It's also an increasingly common execution environment for a kernel to find itself in. Dealing with proper paravirtualized timer devices is a big improvement over trying to reliably deal with fully virtualized hardware timers, which simply can't make the same guarantees that real hardware can make - such as "you will definitely get N ns of CPU time between doing the delta->absolute computation and programming the match register". > I know exactly where you are heading: > > Offload the handling of hypervisor design decisions to the kernel and > let us deal with that. So we need to implement 128 bit math to convert > back and forth and I expect more interesting things to creep up. > I wouldn't put it that way. We've been getting a lot of pressure to keep the pv_ops interface as small as possible. Reusing existing kernel interfaces rather than making up new ones is a good way to do that. The clock infrastructure certainly cleans things up; earlier Xen patches made a complete copy of the old kernel/time.c and hacked it around, which isn't what anyone wants to do. > All this is of _NO_ use and benefit for the kernel itself. > Lots of people want to run Linux in virtual machines. If we can make sane kernel changes to help those users, then that is of use an benefit to the kernel. > Real hardware copes well with relative deltas for the events, even when > it is match register based. I thought long about the support for > absolute expiry values in cycles and decided against them to avoid that > math hackery, which you folks now demand. > Not really. Xen and VMI interfaces both use absolute monotonic time for timeouts, which is certainly a common case for such interfaces (pthread_cond_timedwait, for example). Converting delta to absolute is clearly simple, but it does introduce an added bit of non-determinism if your CPU can be preempted from outside at any time. I presume SMM or similar interrupts can cause the same problem on real hardware. I guess the worst case for real hardware is an absolute-time match register which only compares for match==now rather than match<=now, since you could completely lose the time event if you miss the deadline. >> static const struct clock_event_device xen_clockevent = { >> .name = "xen", >> .features = CLOCK_EVT_FEAT_ONESHOT, >> >> .max_delta_ns = 0x7fffffff, >> .min_delta_ns = 100, /* ? */ >> >> .mult = 1<<XEN_SHIFT, >> .shift = XEN_SHIFT, >> > > We can optimize this by skipping the conversion via a feature flag. > The clocksource needed the shift for ntp warping. Does the clockevent need a shift at all? Could I just set mult/shift to 1/0? > Your implementation is almost the perfect prototype, if you move the > 128 bit hackery into the hypervisor and hide it away from the kernel :) > The point is to use the tsc to avoid making any hypercalls, so dealing with the tsc->ns conversion has to happen on the guest side somehow. > One of these is perfectly fine for _ALL_ of the hypervisor folks. > Anything else is just a backwards decision for the kernel. > That would certainly be ideal. We'll look at the xen, vmi, lguest and kvm paravirtualized time models and see how much they really have in common. I'm a bit curious about how vmi's time events make their way back into the system. J _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxx https://lists.osdl.org/mailman/listinfo/virtualization