Re: [PATCH v2 4/4] livepatch/shadow: Add garbage collection of shadow variables

Petr Mladek <pmladek@xxxxxxxx> · Tue, 17 Jan 2023 16:01:57 +0100

Hi,

I am sorry for answering this so late. It somehow fallen under cracks.

On Sun 2022-11-13 10:51:38, Josh Poimboeuf wrote:
> On Fri, Nov 11, 2022 at 10:55:38AM +0100, Petr Mladek wrote:
> > > >From my experience, there are basically two relevant usage patterns of
> > > shadow variables.
> > > 1.) To hand over global state from one sublivepatch to its pendant in
> > >     the to-be-applied livepatch module. Example: a new global mutex or
> > >     alike.
> > > 2.) The "regular" intended usage, attaching shadow variables to real
> > >     (data) objects.
> > > 
> > > To manage lifetime for 1.), we usually implement some refcount scheme,
> > > managed from the livepatches' module_init()/_exit(): the next livepatch
> > > would subscribe to the shared state before the previous one got a chance
> > > to release it. This works in practice, but the code related to it is
> > > tedious to write and quite verbose.
> > > 
> > > The second usage pattern is much more difficult to implement correctly
> > > in light of possible livepatch downgrades to a subset of
> > > sublivepatches. Usually a sublivepatch making use of a shadow variable
> > > attached to real objects would livepatch the associated object's
> > > destruction code to free up the associated shadow, if any. If the next
> > > livepatch to be applied happened to not contain this sublivepatch in
> > > question as well, the destruction code would effectively become
> > > unpatched, and any existing shadows leaked. Depending on the object type
> > > in question, this memory leakage might or might not be an actual
> > > problem, but it isn't nice either way.
> > > 
> > > Often, there's a more subtle issue with the latter usecase though: the
> > > shadow continues to exist, but becomes unmaintained once the transitions
> > > has started. If said sublivepatch happens to become reactivated later
> > > on, it would potentially find stale shadows, and these could even get
> > > wrongly associated with a completely different object which happened to
> > > get allocated at the same memory address. Depending on the shadow type,
> > > this might or might not be Ok. New per-object locks or a "TLB flush
> > > needed" boolean would probably be Ok, but some kind of refcount would
> > > certainly not. There's not much which could be done from the pre-unpatch
> > > callbacks, because these aren't getting invoked for atomic-replace
> > > downgrades.

IMHO, this is the reason why we should make it per-object.

If the shadow variable was used by a livepatched module and we remove
this module then the shadow variables would get unmaintained. It would
results in the problem described in this paragraph.

> > > We had quite some discussion internally on how to best approach these
> > > limitations, the outcome being (IIRC), that a more versatile callbacks
> > > support would perhaps quickly become too complex or error-prone to use
> > > correctly. So Petr proposed this garbage collection/refcounting
> > > mechanism posted here, which would solve the memory leakage issue as a
> > > first step (and would make shadow variable usage less verbose IMO).
> > > 
> > > The consistency problem would still not be fully solved: consider a
> > > refcount-like shadow, where during the transition some acquirer had been
> > > unpatched already, while a releaser has not yet. But my hope is that we
> > > can later build on this work here and somehow resolve this as well.
> 
> It would be great to have all this motivation for the new feature
> documented in shadow-vars.rst.
> 
> > > > Nicolai, your have the practical experience. Should the reference
> > > > counting be per-livepatched object or per-livepatch, please?
> > > 
> > > See above, I think it won't matter much from a functionality POV.
> > 
> > I would personally keep it tied together with the livepatched object
> > just to be on the safe side.
> 
> If downgrades are going to be commonplace then I agree my automatic
> detection idea wouldn't work so well.  And ref counting does make sense.
> 
> However I'm still not convinced that per-object is the way to go.
> Doesn't that create more room for human error?  There's no way to
> detect-and-warn if the wrong object is used, or if the variable is used
> by multiple objects but only one of them is listed.

I agree that these mistakes might happen. And I really do not see
any reasonable way how to auto-detect which livepatched object uses
which shadow variables.

Well, the per-object registration allows to register all shadow
types for the "vmlinux" object that is always loaded. It would
work as you suggested. I mean that it would keep the shadow variables
registered as long as the livepatch is loaded.

But I would still like to keep the registration per-object.
It would allow to handle re-load of livepatched modules
the right way when needed.

And I agree that we should document these pitfalls in the documentation.

> Per-patch shadow variable types would be easy to detect and warn on
> misuse.  And easy to automate in patch author tooling.
> 
> Also, I'm not crazy about the new API.  It's even more confusing than
> before.

I do not think that it made the API much worse. It actually made some
things more safe and easier.

The ctor/dtor callbacks are defined by the shadow type. It removes
the risk to passing a wrong one.

Also there is not need to free the unused shadow variables in
post_un() callbacks. It was actually pretty tricky because
it required the reference counting.

Best Regards,
Petr