Re: [PATCH v2 4/4] livepatch/shadow: Add garbage collection of shadow variables

Petr Mladek <pmladek@xxxxxxxx> · Fri, 11 Nov 2022 10:55:38 +0100

On Fri 2022-11-11 10:20:44, Nicolai Stange wrote:
> Hi all,
> 
> Petr Mladek <pmladek@xxxxxxxx> writes:
> 
> > On Thu 2022-11-03 18:03:27, Josh Poimboeuf wrote:
> >> On Wed, Oct 26, 2022 at 04:41:22PM -0300, Marcos Paulo de Souza wrote:
> >> > The life of shadow variables is not completely trivial to maintain.
> >> > They might be used by more livepatches and more livepatched objects.
> >> > They should stay as long as there is any user.
> >> > 
> >> > In practice, it requires to implement reference counting in callbacks
> >> > of all users. They should register all the user and remove the shadow
> >> > variables only when there is no user left.
> >> > 
> >> > This patch hides the reference counting into the klp_shadow API.
> >> > The counter is connected with the shadow variable @id. It requires
> >> > an API to take and release the reference. The release function also
> >> > calls the related dtor() when defined.
> >> > 
> >> > An easy solution would be to add some get_ref()/put_ref() API.
> >> > But it would need to get called from pre()/post_un() callbacks.
> >> > It might be easy to forget a callback and make it wrong.
> >> > 
> >> > A more safe approach is to associate the klp_shadow_type with
> >> > klp_objects that use the shadow variables. The livepatch core
> >> > code might then handle the reference counters on background.
> >> > 
> >> > The shadow variable type might then be added into a new @shadow_types
> >> > member of struct klp_object. They will get then automatically registered
> >> > and unregistered when the object is being livepatched. The registration
> >> > increments the reference count. Unregistration decreases the reference
> >> > count. All shadow variables of the given type are freed when the reference
> >> > count reaches zero.
> >> > 
> >> > All klp_shadow_alloc/get/free functions also checks whether the requested
> >> > type is registered. It will help to catch missing registration and might
> >> > also help to catch eventual races.
> >> 
> >> Is there a reason the shadow variable lifetime is tied to klp_object
> >> rather than klp_patch?
> >
> > Good question!
> >
> > My thinking was that shadow variables might be tight to variables
> > from a particular livepatched module. It would make sense to free them
> > when the only user (livepatched module) is removed.
> >
> > The question is if it is really important.
> 
> I don't think so. For shadow variables attached to "real" data objects,
> I would expect those objects to be gone by the time the patched ko gets
> removed. For dummy shadows attached to NULL (see below), used for
> handing over shared state between atomic-replace livepatch modules, the
> memory footprint is negligible, I'd say.
> 
> OTOH, callbacks are per klp_object already, so for API consistency, it
> might make sense to stick to this pattern for the shadows as well. But I
> don't have a real opinion on this.

Good point!

> >From my experience, there are basically two relevant usage patterns of
> shadow variables.
> 1.) To hand over global state from one sublivepatch to its pendant in
>     the to-be-applied livepatch module. Example: a new global mutex or
>     alike.
> 2.) The "regular" intended usage, attaching schadow variables to real
>     (data) objects.
> 
> To manage lifetime for 1.), we usually implement some refcount scheme,
> managed from the livepatches' module_init()/_exit(): the next livepatch
> would subscribe to the shared state before the previous one got a chance
> to release it. This works in practice, but the code related to it is
> tedious to write and quite verbose.
> 
> The second usage pattern is much more difficult to implement correctly
> in light of possible livepatch downgrades to a subset of
> sublivepatches. Usually a sublivepatch making use of a shadow variable
> attached to real objects would livepatch the associated object's
> destruction code to free up the associated shadow, if any. If the next
> livepatch to be applied happened to not contain this sublivepatch in
> question as well, the destruction code would effectively become
> unpatched, and any existing shadows leaked. Depending on the object type
> in question, this memory leakage might or might not be an actual
> problem, but it isn't nice either way.
> 
> Often, there's a more subtle issue with the latter usecase though: the
> shadow continues to exist, but becomes unmaintained once the transitions
> has started. If said sublivepatch happens to become reactivated later
> on, it would potentially find stale shadows, and these could even get
> wrongly associated with a completely different object which happened to
> get allocated at the same memory address. Depending on the shadow type,
> this might or might not be Ok. New per-object locks or a "TLB flush
> needed" boolean would probably be Ok, but some kind of refcount would
> certainly not. There's not much which could be done from the pre-unpatch
> callbacks, because these aren't getting invoked for atomic-replace
> downgrades.
> 
> We had quite some discussion internally on how to best approach these
> limitations, the outcome being (IIRC), that a more versatile callbacks
> support would perhaps quickly become too complex or error-prone to use
> correctly. So Petr proposed this garbage collection/refcounting
> mechanism posted here, which would solve the memory leakage issue as a
> first step (and would make shadow variable usage less verbose IMO).
> 
> The consistency problem would still not be fully solved: consider a
> refcount-like shadow, where during the transition some acquirer had been
> unpatched already, while a releaser has not yet. But my hope is that we
> can later build on this work here and somehow resolve this as well.
> 
> > Nicolai, your have the practical experience. Should the reference
> > counting be per-livepatched object or per-livepatch, please?
> 
> See above, I think it won't matter much from a functionality POV.

I would personally keep it tied together with the livepatched object
just to be on the safe side.

Is this acceptable for kPatch users in the end?

Best Regards,
Petr