On Fri, Jul 26, 2024 at 01:09:24AM -0400, Michael S. Tsirkin wrote: > On Thu, Jul 25, 2024 at 10:29:18PM +0100, David Woodhouse wrote: > > > > > Then can't we fix it by interrupting all CPUs right after LM? > > > > > > > > > > To me that seems like a cleaner approach - we then compartmentalize > > > > > the ABI issue - kernel has its own ABI against userspace, > > > > > devices have their own ABI against kernel. > > > > > It'd mean we need a way to detect that interrupt was sent, > > > > > maybe yet another counter inside that structure. > > > > > > > > > > WDYT? > > > > > > > > > > By the way the same idea would work for snapshots - > > > > > some people wanted to expose that info to userspace, too. > > > > Those people included me. I wanted to interrupt all the vCPUs, even the > > ones which were in userspace at the moment of migration, and have the > > kernel deal with passing it on to userspace via a different ABI. > > > > It ends up being complex and intricate, and requiring a lot of new > > kernel and userspace support. I gave up on it in the end for snapshots, > > and didn't go there again for this. > > Maybe become you insist on using ACPI? > I see a fairly simple way to do it. For example, with virtio: > > one vq per CPU, with a single outstanding buffer, > callback copies from the buffer into the userspace > visible memory. > > Want me to show you the code? Couldn't resist, so I wrote a bit of this code. Fundamentally, we keep a copy of the hypervisor abi in the device: struct virtclk_info *vci { struct vmclock_abi abi; }; each vq will has its own copy: struct virtqueue_info { struct scatterlist sg[]; struct vmclock_abi abi; } we add it during probe: sg_init_one(vqi->sg, &vqi->abi, sizeof(vqi->abi)); virtqueue_add_inbuf(vq, vqi->sg, 1, &vq->vabi, GFP_ATOMIC); We set the affinity for each vq: for (i = 0; i < num_online_cpus(); i++) virtqueue_set_affinity(vi->vq[i], i); (virtio net does it, and it handles cpu hotplug as well) each vq callback would do: static void vmclock_cb(struct virtqueue *vq) { struct virtclk_info *vci = vq->vdev->priv; struct virtqueue_info *vqi = vq->priv; void *buf; unsigned int len; buf = virtqueue_get_buf(vq, &len); if (!buf) return; BUG_ON(buf != &vq->abi); spin_lock(vci->lock); if (memcmp(&vci->abi, &vqi->abi, sizeof(vqi->abi))) { memcpy(&vci->abi, &vqi->abi, sizeof(vqi->abi)); } /* Update the userspace visible structure now */ ..... /* Re-add the buffer */ virtqueue_add_inbuf(vq, vqi->sg, 1, &vqi->abi, GFP_ATOMIC); spin_unlock(vi->lock); } That's it! Where's the problem here? -- MST