On 12/01/2021, 15:09, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: On Tue, Jan 12, 2021 at 02:15:59PM +0200, Adrian Catangiu wrote: > +3) Mapped memory polling simplified example:: > + > + /* > + * app/library function that provides cached secrets > + */ > + char * safe_cached_secret(app_data_t *app) > + { > + char *secret; > + volatile unsigned *const genid_ptr = get_sysgenid_mapping(app); > + again: > + secret = __cached_secret(app); > + > + if (unlikely(*genid_ptr != app->cached_genid)) { > + app->cached_genid = *genid_ptr; > + barrier(); > + > + // rebuild world then confirm the genid update (thru write) > + rebuild_caches(app); > + > + ack_sysgenid_update(app); > + > + goto again; > + } > + > + return secret; Hmm. This seems to rely on the assumption that if you have read the ID and it did not change, then all is well. Problem is, in the interrupt driven implementation this is not a safe assumption to make: ID from hypervisor might have changed but interrupt could be delayed. If we map the hypervisor ID to userspace then we won't have this race ... worth worry about? why not? This is a very good point! Unfortunately, there is no immediate solution here. The current patch-set makes this trade-off in order to gain the genericity of a system-level generation ID which is not limited to VMs usecases, but can also be used with things like CRIU. Directly mapping the vmgenid UUID to userspace was the initial design of this patch-set (see v1), but it was argued against and evolved into current state. I would personally rather enhance the HW support (vmgenid device for example) to also expose a configurable u32 Vm Gen Counter alongside the 128-bit UUID; and add support in SysGenID to offload the counter to HW when applicable. The broader view is we need to strike the right balance between a functional, safe mechanism with today's technology, but also design a future-proof generic mechanism. Fixing the race you mentioned in SysGenID only moves the race a bit further up the stack - you generate the secret race-free but the secret can still be duplicated in the next layer. To make any mechanism completely safe we need to conceptually disconnect ourselves from believing that a restored snapshot is immediately available. There needs to be some entity that moves the restored VM/container/system from a transient state to "available". That entity can be a process inside the VM, but it can also be something outside the VM, in the hypervisor or in the stack surrounding it. That entity is responsible for managing the transition of the VM or container from transient -> available. It is responsible for not allowing the VM/ container to be used or usable until that transition is complete. In the first generations of VM clones and CRIU production deployments, I expect this entity to be a stack-specific component with intimate knowledge of the system components, transient states, lifecycle, etc. Which this patch-set enables. Thanks, Adrian. Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.