Hi, I am sorry for answering this so late. It somehow fallen under cracks. On Sun 2022-11-13 10:51:38, Josh Poimboeuf wrote: > On Fri, Nov 11, 2022 at 10:55:38AM +0100, Petr Mladek wrote: > > > >From my experience, there are basically two relevant usage patterns of > > > shadow variables. > > > 1.) To hand over global state from one sublivepatch to its pendant in > > > the to-be-applied livepatch module. Example: a new global mutex or > > > alike. > > > 2.) The "regular" intended usage, attaching shadow variables to real > > > (data) objects. > > > > > > To manage lifetime for 1.), we usually implement some refcount scheme, > > > managed from the livepatches' module_init()/_exit(): the next livepatch > > > would subscribe to the shared state before the previous one got a chance > > > to release it. This works in practice, but the code related to it is > > > tedious to write and quite verbose. > > > > > > The second usage pattern is much more difficult to implement correctly > > > in light of possible livepatch downgrades to a subset of > > > sublivepatches. Usually a sublivepatch making use of a shadow variable > > > attached to real objects would livepatch the associated object's > > > destruction code to free up the associated shadow, if any. If the next > > > livepatch to be applied happened to not contain this sublivepatch in > > > question as well, the destruction code would effectively become > > > unpatched, and any existing shadows leaked. Depending on the object type > > > in question, this memory leakage might or might not be an actual > > > problem, but it isn't nice either way. > > > > > > Often, there's a more subtle issue with the latter usecase though: the > > > shadow continues to exist, but becomes unmaintained once the transitions > > > has started. If said sublivepatch happens to become reactivated later > > > on, it would potentially find stale shadows, and these could even get > > > wrongly associated with a completely different object which happened to > > > get allocated at the same memory address. Depending on the shadow type, > > > this might or might not be Ok. New per-object locks or a "TLB flush > > > needed" boolean would probably be Ok, but some kind of refcount would > > > certainly not. There's not much which could be done from the pre-unpatch > > > callbacks, because these aren't getting invoked for atomic-replace > > > downgrades. IMHO, this is the reason why we should make it per-object. If the shadow variable was used by a livepatched module and we remove this module then the shadow variables would get unmaintained. It would results in the problem described in this paragraph. > > > We had quite some discussion internally on how to best approach these > > > limitations, the outcome being (IIRC), that a more versatile callbacks > > > support would perhaps quickly become too complex or error-prone to use > > > correctly. So Petr proposed this garbage collection/refcounting > > > mechanism posted here, which would solve the memory leakage issue as a > > > first step (and would make shadow variable usage less verbose IMO). > > > > > > The consistency problem would still not be fully solved: consider a > > > refcount-like shadow, where during the transition some acquirer had been > > > unpatched already, while a releaser has not yet. But my hope is that we > > > can later build on this work here and somehow resolve this as well. > > It would be great to have all this motivation for the new feature > documented in shadow-vars.rst. > > > > > Nicolai, your have the practical experience. Should the reference > > > > counting be per-livepatched object or per-livepatch, please? > > > > > > See above, I think it won't matter much from a functionality POV. > > > > I would personally keep it tied together with the livepatched object > > just to be on the safe side. > > If downgrades are going to be commonplace then I agree my automatic > detection idea wouldn't work so well. And ref counting does make sense. > > However I'm still not convinced that per-object is the way to go. > Doesn't that create more room for human error? There's no way to > detect-and-warn if the wrong object is used, or if the variable is used > by multiple objects but only one of them is listed. I agree that these mistakes might happen. And I really do not see any reasonable way how to auto-detect which livepatched object uses which shadow variables. Well, the per-object registration allows to register all shadow types for the "vmlinux" object that is always loaded. It would work as you suggested. I mean that it would keep the shadow variables registered as long as the livepatch is loaded. But I would still like to keep the registration per-object. It would allow to handle re-load of livepatched modules the right way when needed. And I agree that we should document these pitfalls in the documentation. > Per-patch shadow variable types would be easy to detect and warn on > misuse. And easy to automate in patch author tooling. > > Also, I'm not crazy about the new API. It's even more confusing than > before. I do not think that it made the API much worse. It actually made some things more safe and easier. The ctor/dtor callbacks are defined by the shadow type. It removes the risk to passing a wrong one. Also there is not need to free the unused shadow variables in post_un() callbacks. It was actually pretty tricky because it required the reference counting. Best Regards, Petr