Re: [PATCH v4 1/2] drivers/misc: sysgenid: add system generation id driver

"Catangiu, Adrian Costin" <acatan@xxxxxxxxxx> · Thu, 21 Jan 2021 12:48:23 +0000

On 12/01/2021, 15:09, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:

    On Tue, Jan 12, 2021 at 02:15:59PM +0200, Adrian Catangiu wrote:

    > +3) Mapped memory polling simplified example::
    > +
    > +     /*
    > +      * app/library function that provides cached secrets
    > +      */
    > +     char * safe_cached_secret(app_data_t *app)
    > +     {
    > +             char *secret;
    > +             volatile unsigned *const genid_ptr = get_sysgenid_mapping(app);
    > +     again:
    > +             secret = __cached_secret(app);
    > +
    > +             if (unlikely(*genid_ptr != app->cached_genid)) {
    > +                     app->cached_genid = *genid_ptr;
    > +                     barrier();
    > +
    > +                     // rebuild world then confirm the genid update (thru write)
    > +                     rebuild_caches(app);
    > +
    > +                     ack_sysgenid_update(app);
    > +
    > +                     goto again;
    > +             }
    > +
    > +             return secret;

    Hmm. This seems to rely on the assumption that if you have
    read the ID and it did not change, then all is well.

    Problem is, in the interrupt driven implementation
    this is not a safe assumption to make: ID
    from hypervisor might have changed but interrupt
    could be delayed.

    If we map the hypervisor ID to userspace then we won't
    have this race ... worth worry about? why not?

This is a very good point! Unfortunately, there is no immediate solution here.
The current patch-set makes this trade-off in order to gain the genericity of
a system-level generation ID which is not limited to VMs usecases, but can also
be used with things like CRIU.

Directly mapping the vmgenid UUID to userspace was the initial design of this
patch-set (see v1), but it was argued against and evolved into current state.

I would personally rather enhance the HW support (vmgenid device for example)
to also expose a configurable u32 Vm Gen Counter alongside the 128-bit UUID;
and add support in SysGenID to offload the counter to HW when applicable.

The broader view is we need to strike the right balance between a functional,
safe mechanism with today's technology, but also design a future-proof generic
mechanism.

Fixing the race you mentioned in SysGenID only moves the race a bit further up
the stack - you generate the secret race-free but the secret can still be duplicated
in the next layer. To make any mechanism completely safe we need to conceptually
disconnect ourselves from believing that a restored snapshot is immediately available.
There needs to be some entity that moves the restored VM/container/system
from a transient state to "available". That entity can be a process inside the VM,
but it can also be something outside the VM, in the hypervisor or in the stack
surrounding it. That entity is responsible for managing the transition of the VM
or container from transient -> available. It is responsible for not allowing the VM/
container to be used or usable until that transition is complete.

In the first generations of VM clones and CRIU production deployments, I expect
this entity to be a stack-specific component with intimate knowledge of the system
components, transient states, lifecycle, etc. Which this patch-set enables.

Thanks,
Adrian.

Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in Romania. Registration number J22/2621/2005.