On Thu, 2024-07-25 at 16:50 -0400, Michael S. Tsirkin wrote: > On Thu, Jul 25, 2024 at 08:35:40PM +0100, David Woodhouse wrote: > > On Thu, 2024-07-25 at 12:38 -0400, Michael S. Tsirkin wrote: > > > On Thu, Jul 25, 2024 at 04:18:43PM +0100, David Woodhouse wrote: > > > > The use case isn't necessarily for all users of gettimeofday(), of > > > > course; this is for those applications which *need* precision time. > > > > Like distributed databases which rely on timestamps for coherency, and > > > > users who get fined millions of dollars when LM messes up their clocks > > > > and they put wrong timestamps on financial transactions. > > > > > > I would however worry that with all this pass through, > > > applications have to be coded to each hypervisor or even > > > version of the hypervisor. > > > > Yes, that would be a problem. Which is why I feel it's so important to > > harmonise the contents of the shared memory, and I'm implementing it > > both QEMU and $DAYJOB, as well as aligning with virtio-rtc. > > > Writing an actual spec for this would be another thing that might help. > > > I don't think the structure should be changing between hypervisors (and > > especially versions). We *will* see a progression from simply providing > > the disruption signal, to providing the full clock information so that > > guests don't have to abort transactions while they resync their clock. > > But that's perfectly fine. > > > > And it's also entirely agnostic to the mechanism by which the memory > > region is *discovered*. It doesn't matter if it's ACPI, DT, a > > hypervisor enlightenment, a BAR of a simple PCI device, virtio, or > > anything else. > > > > ACPI is one of the *simplest* options for a hypervisor and guest to > > implement, and doesn't prevent us from using the same structure in > > virtio-rtc. I'm happy enough using ACPI and letting virtio-rtc come > > along later. > > > > > virtio has been developed with the painful experience that we keep > > > making mistakes, or coming up with new needed features, > > > and that maintaining forward and backward compatibility > > > becomes a whole lot harder than it seems in the beginning. > > > > Yes. But as you note, this shared memory structure is a userspace ABI > > all of its own, so we get to make a completely *different* kind of > > mistake :) > > > > > So, something I still don't completely understand. > Can't the VDSO thing be written to by kernel? > Let's say on LM, an interrupt triggers and kernel copies > data from a specific device to the VDSO. > > Is that problematic somehow? I imagine there is a race where > userspace reads vdso after lm but before kernel updated > vdso - is that the concern? > > Then can't we fix it by interrupting all CPUs right after LM? > > To me that seems like a cleaner approach - we then compartmentalize > the ABI issue - kernel has its own ABI against userspace, > devices have their own ABI against kernel. > It'd mean we need a way to detect that interrupt was sent, > maybe yet another counter inside that structure. > > WDYT? > > By the way the same idea would work for snapshots - > some people wanted to expose that info to userspace, too. >
<<attachment: smime.p7s>>