On Fri 2022-11-11 10:20:44, Nicolai Stange wrote: > Hi all, > > Petr Mladek <pmladek@xxxxxxxx> writes: > > > On Thu 2022-11-03 18:03:27, Josh Poimboeuf wrote: > >> On Wed, Oct 26, 2022 at 04:41:22PM -0300, Marcos Paulo de Souza wrote: > >> > The life of shadow variables is not completely trivial to maintain. > >> > They might be used by more livepatches and more livepatched objects. > >> > They should stay as long as there is any user. > >> > > >> > In practice, it requires to implement reference counting in callbacks > >> > of all users. They should register all the user and remove the shadow > >> > variables only when there is no user left. > >> > > >> > This patch hides the reference counting into the klp_shadow API. > >> > The counter is connected with the shadow variable @id. It requires > >> > an API to take and release the reference. The release function also > >> > calls the related dtor() when defined. > >> > > >> > An easy solution would be to add some get_ref()/put_ref() API. > >> > But it would need to get called from pre()/post_un() callbacks. > >> > It might be easy to forget a callback and make it wrong. > >> > > >> > A more safe approach is to associate the klp_shadow_type with > >> > klp_objects that use the shadow variables. The livepatch core > >> > code might then handle the reference counters on background. > >> > > >> > The shadow variable type might then be added into a new @shadow_types > >> > member of struct klp_object. They will get then automatically registered > >> > and unregistered when the object is being livepatched. The registration > >> > increments the reference count. Unregistration decreases the reference > >> > count. All shadow variables of the given type are freed when the reference > >> > count reaches zero. > >> > > >> > All klp_shadow_alloc/get/free functions also checks whether the requested > >> > type is registered. It will help to catch missing registration and might > >> > also help to catch eventual races. > >> > >> Is there a reason the shadow variable lifetime is tied to klp_object > >> rather than klp_patch? > > > > Good question! > > > > My thinking was that shadow variables might be tight to variables > > from a particular livepatched module. It would make sense to free them > > when the only user (livepatched module) is removed. > > > > The question is if it is really important. > > I don't think so. For shadow variables attached to "real" data objects, > I would expect those objects to be gone by the time the patched ko gets > removed. For dummy shadows attached to NULL (see below), used for > handing over shared state between atomic-replace livepatch modules, the > memory footprint is negligible, I'd say. > > OTOH, callbacks are per klp_object already, so for API consistency, it > might make sense to stick to this pattern for the shadows as well. But I > don't have a real opinion on this. Good point! > >From my experience, there are basically two relevant usage patterns of > shadow variables. > 1.) To hand over global state from one sublivepatch to its pendant in > the to-be-applied livepatch module. Example: a new global mutex or > alike. > 2.) The "regular" intended usage, attaching schadow variables to real > (data) objects. > > To manage lifetime for 1.), we usually implement some refcount scheme, > managed from the livepatches' module_init()/_exit(): the next livepatch > would subscribe to the shared state before the previous one got a chance > to release it. This works in practice, but the code related to it is > tedious to write and quite verbose. > > The second usage pattern is much more difficult to implement correctly > in light of possible livepatch downgrades to a subset of > sublivepatches. Usually a sublivepatch making use of a shadow variable > attached to real objects would livepatch the associated object's > destruction code to free up the associated shadow, if any. If the next > livepatch to be applied happened to not contain this sublivepatch in > question as well, the destruction code would effectively become > unpatched, and any existing shadows leaked. Depending on the object type > in question, this memory leakage might or might not be an actual > problem, but it isn't nice either way. > > Often, there's a more subtle issue with the latter usecase though: the > shadow continues to exist, but becomes unmaintained once the transitions > has started. If said sublivepatch happens to become reactivated later > on, it would potentially find stale shadows, and these could even get > wrongly associated with a completely different object which happened to > get allocated at the same memory address. Depending on the shadow type, > this might or might not be Ok. New per-object locks or a "TLB flush > needed" boolean would probably be Ok, but some kind of refcount would > certainly not. There's not much which could be done from the pre-unpatch > callbacks, because these aren't getting invoked for atomic-replace > downgrades. > > We had quite some discussion internally on how to best approach these > limitations, the outcome being (IIRC), that a more versatile callbacks > support would perhaps quickly become too complex or error-prone to use > correctly. So Petr proposed this garbage collection/refcounting > mechanism posted here, which would solve the memory leakage issue as a > first step (and would make shadow variable usage less verbose IMO). > > The consistency problem would still not be fully solved: consider a > refcount-like shadow, where during the transition some acquirer had been > unpatched already, while a releaser has not yet. But my hope is that we > can later build on this work here and somehow resolve this as well. > > > Nicolai, your have the practical experience. Should the reference > > counting be per-livepatched object or per-livepatch, please? > > See above, I think it won't matter much from a functionality POV. I would personally keep it tied together with the livepatched object just to be on the safe side. Is this acceptable for kPatch users in the end? Best Regards, Petr