Recently, a proposal has been published [1] for a new feature in the VirtIO RNG device which will allows the device to report "entropy leaks" to the guest VM. Such an event occurs when, for example, we take a VM snapshot, or when we restore a VM from a snapshot. The feature allows the guest to request for certain operations to be performed upon an entropy leak event. When such an event occurs, the device will handle the requests and add the request buffers to the used queue. Adding these buffers to the used queue operates as a notification towards the guest about the entropy leak event. The proposed changes describe two types of requests that can be performed: (1) fill a buffer in guest memory with random bytes and (2) perform a memory copy between two buffers in guest memory. The mechanism provides similar functionality to Microsoft's Virtual Machine Generation ID and it can be used to re-seed the kernel's PRNG upon taking a VM snapshot or resuming from one. Additionally, it allows to (1) avoid the race-condition that exists with our VMGENID implementation, between the time a VM is resumed after a "leak event" and the handling of the ACPI notification before adding the new entropy. Finally, it allows building on top of it to provide a mechanism for notifying user-space about such events. The first patch of this series, extends the current virtio-rng driver to implement the new feature and ensures that there is always a request to get some random bytes from the device in the event of an entropy leak and uses these bytes as entropy through the `add_device_randomness`. The second patch adds a copy-on-leak command as well in the queue, implementating the idea of a generation counter that has previously been part of the VMGENID saga. It then exposes the value of the generation counter over a sysfs file. User-space can read, mmap and poll on the file in order to be notified about entropy leak events. I have performed basic tests of the user-space interfaces using a Firecracker where I implemented virtio-rng with the proposed features. Instructions on how to replicate this can be found here: https://github.com/bchalios/virtio-snapsafe-example The patchset does not solve all problems. We do not define an API for other parts of the kernel to be able to use directly the new functionality (add commands to the queue), mainly because I 'm not sure what would the correct API be. I was toying with the idea of extending `struct hwrng` with two new hooks that would be implemented only by virtio-rng but I'm not sure I like it, so I am open to suggestions. As a result of the above, the way we use the functionality to add new entropy, i.e. calling `add_device_randomness`, is as racy as the VMGENID case, since it relies on used buffers been handled by the virtio driver. As for user-space, the `mmap` interface *is* race-free. Changes in the generation counter will be observable by user applications the moment VM vcpus resume. However, the `poll` interface isn't, `sysfs_notify` is being called as well when the virtio driver handles used buffers. I am not sure I have a solution for this last one. Posting this, I hope we can resume the discussion about solving the above issues (or any other issue that I haven't thought of), especially with regards to providing a mechanism suitable for user-space notifications. Cheers, Babis Changes in v2: fix kbuild warnings Babis Chalios (2): virtio-rng: implement entropy leak feature virtio-rng: add sysfs entries for leak detection drivers/char/hw_random/virtio-rng.c | 372 +++++++++++++++++++++++++++- include/uapi/linux/virtio_rng.h | 3 + 2 files changed, 368 insertions(+), 7 deletions(-) -- 2.38.1 Amazon Spain Services sociedad limitada unipersonal, Calle Ramirez de Prado 5, 28045 Madrid. Registro Mercantil de Madrid . Tomo 22458 . Folio 102 . Hoja M-401234 . CIF B84570936