From: Alexander Graf <graf@xxxxxxxxxx> Sent: Monday, May 2, 2022 11:35 AM > > On 02.05.22 20:04, Jason A. Donenfeld wrote: > > Hey Lennart, > > > > On Mon, May 02, 2022 at 06:51:19PM +0200, Lennart Poettering wrote: > >> On Mo, 02.05.22 18:12, Jason A. Donenfeld (Jason@xxxxxxxxx) wrote: > >> > >>>>> In order to inform userspace of virtual machine forks, this commit adds > >>>>> a "fork_event" sysctl, which does not return any data, but allows > >>>>> userspace processes to poll() on it for notification of VM forks. > >>>>> > >>>>> It avoids exposing the actual vmgenid from the hypervisor to userspace, > >>>>> in case there is any randomness value in keeping it secret. Rather, > >>>>> userspace is expected to simply use getrandom() if it wants a fresh > >>>>> value. > >>>> Wouldn't it make sense to expose a monotonic 64bit counter of detected > >>>> VM forks since boot through read()? It might be interesting to know > >>>> for userspace how many forks it missed the fork events for. Moreover it > >>>> might be interesting to userspace to know if any fork happened so far > >>>> *at* *all*, by checking if the counter is non-zero. > >>> "Might be interesting" is different from "definitely useful". I'm not > >>> going to add this without a clear use case. This feature is pretty > >>> narrowly scoped in its objectives right now, and I intend to keep it > >>> that way if possible. > >> Sure, whatever. I mean, if you think it's preferable to have 3 API > >> abstractions for the same concept each for it's special usecase, then > >> that's certainly one way to do things. I personally would try to > >> figure out a modicum of generalization for things like this. But maybe > >> that' just me… > >> > >> I can just tell you, that in systemd we'd have a usecase for consuming > >> such a generation counter: we try to provide stable MAC addresses for > >> synthetic network interfaces managed by networkd, so we hash them from > >> /etc/machine-id, but otoh people also want them to change when they > >> clone their VMs. We could very nicely solve this if we had a > >> generation counter easily accessible from userspace, that starts at 0 > >> initially. Because then we can hash as we always did when the counter > >> is zero, but otherwise use something else, possibly hashed from the > >> generation counter. > > This doesn't work, because you could have memory-A split into memory-A.1 > > and memory-A.2, and both A.2 and A.1 would ++counter, and wind up with > > the same new value "2". The solution is to instead have the hypervisor > > pass a unique value and a counter. We currently have a 16 byte unique > > value from the hypervisor, which I'm keeping as a kernel space secret > > for the RNG; we're waiting on a word-sized monotonic counter interface > > from hypervisors in the future. When we have the latter, then we can > > start talking about mmapable things. Your use case would probably be > > served by exposing that 16-byte unique value (hashed with some constant > > for safety I suppose), but I'm hesitant to start going down that route > > all at once, especially if we're to have a more useful counter in the > > future. > > > Michael, since we already changed the CID in the spec, can we add a > property to the device that indicates the first 4 bytes of the UUID will > always be different between parent and child? > > That should give us the ability to mmap the vmgenid directly to user > space and act based on a simple u32 compare for clone notification, no? > I'm not ignoring this request, but my interpretation of the subsequent discussion is that it's probably not the path that we want to go down anyway. Is that a correct interpretation? Also, the chances of getting the Windows team to focus on a revision to the spec are not high, especially a revision that has new semantics. :-( Getting the new CID added was a relatively low bar, though I'm still trying to get the publicly available version of the spec updated to include the new CID. Michael