Re: livepatch callback features

Joe Lawrence <joe.lawrence@xxxxxxxxxx> · Mon, 18 Feb 2019 10:05:30 -0500

On 02/15/2019 05:58 AM, Nicolai Stange wrote:
Petr Mladek <pmladek@xxxxxxxx> writes:
On Thu 2019-02-14 11:47:42, Nicolai Stange wrote:
Joe Lawrence <joe.lawrence@xxxxxxxxxx> writes:
>>
I see the following advantages of the dynamic approach:

      + easy to transfer (it just stays in the memory, some list)

      + Localized code (alloc/get only where needed, no need to
        define it in some global structures, lists).

      + One structure for a single feature all the time. No confusion
        which structure you work with. Temporary states might always
        be stored in local static variables and stored into
        the structure once the feature is fully migrated.

        I could imagine that some people might see this as
        a disadvantage though.

+ Clearer sysfs entry, if we choose to expose it, since it would be 
decoupled from any specific livepatch lifetime and could live under 
/sys/.../livepatch/features/

The disadvantages are:

     + allocation might fail (must be done in pre_patch())

Or module .init?

     + more complicated compatibility check (another callback
       to check provided features)

Ditto?

Features could still be stored in a list and the callbacks could get
access to current vs. this patch's feature set.  They wouldn't be
dynamic, but then the API supporting that could drop out.

Right, so during a transition, access to klp_features from the previous
live patch would be possible only up to and including the pre-patch
callback, correct? That would probably be enough to "take over" in all
situations. If the next live patch is sticky, ->data ownership wouldn't
be a problem, because, as Petr said in reply to the stickiness patch,
the transition should be non-reversable in this case. If we're dealing
with non-sticky live patches, the post-unpatch callback would only get
called if the live patch hasn't been superseeded by another one (thanks
to the atomic replace patchset) -- so we could teardown any ->data from
there.

Anyway, this is another motivation to remove object pointer
from the feature structure. By other words, the dynamic structure
should not point to any static code that might get later removed
(because it is error prone).

Yes, makes sense. 

Agreed here.

- For follow on livepatches, would their feature set need to match or
exceed the values for all of the current features, or only the
features that the new patch is advertising to support?  For example,
if we have current features:

   vmlinux : { { L1TF, v1 }, { CVE2, v2 } }

should the incoming cumulative patch need L1TF >= v1 and CVE2 >= v2 ?
Or does it only need L1TF >= v1 *or* CVE2 >= v2?

Dunno if cumulative livepatches change that requirement or not.

I have to revisit the semantics of non-cumulative live patches
after atomic replace, but I think that if the one to get applied is
cumulative, it's an "and" and for those which are non-cumulative it
could perhaps be an "or".

Good question. Well, I think that non-cumulative patches will not
need to check anything. They are primary used to add unrelated fixes.
They might theoretically take over or extend an existing feature
but then the authors/users is on their own shifting sands.

It may get confusing, but someone may want a set of features that handle 
scheduler fixes and an orthogonal set that touch a device driver.  In 
that case, non-cumulative patches may want to handle these separately.

Hmm, I think we could require the versions to be >= what is currently
there, but only for the features the new non-cumulative live patch
supports?

I.e, for cumulative live patches, it would be: "all features currently
registered must be provided by a new cumulative live patch in the same
or higher version".

And for non-cumulative ones: "all features supported by a new
non-cumulative live patch must be provided in the same or higher version
than what's currently registered on the system".

Yeah I think that those would be the two major use cases we could choose 
to support.  As I mentioned before, I'd be curious to see how the 
non-cumulative model scales after many livepatches are loaded.

Which brings another question: similar to the "sticky" case preventing
unpatching, we probably shouldn't allow the the transition to get
reversed back to live patches not providing a particular feature at all
or only in a lower version?

Perhaps it would make sense to make stickiness a klp_feature property?

It might be unclear with patch is sticky with the dynamically
allocated structures. It might work with dynamically allocated
structures.

The question is if it is worth the complexity. It is a global
state. I wonder if any callback would ever want to disable
the sticky flag once set.

I don't care too much whether or not the stickiness is managed per-patch
or or'ed together from individual features.

But I think what is important is how the feature versioning affects
the reversability of transitions.

That is, as we won't allow a transision to a sticky patch to be
reversed, we probably also don't want to allow a transition to a patch
with a feature of a higher version to be reversed -- just to be
consistent there?

I had a question about the reversability of sticky patches in my Friday 
reply.  If we choose to disallow that, then it might make to extend the 
same policy to (higher) features.  But I had a question about callback 
execution with respect to reversal in that other mail I'd like to sort 
out first.

Let me throw in a half-baked idea/dream of mine completely unrelated to
changing global semantics, safe upgrade paths and so on.

>From time to time, the upstream mitigation for a certain CVE involves
new control knobs. For example, in the L1TF kvm case, there had been a
new option (whose name I forgot) to disable hyperthreading. The live
patch author currently has no standardized way of providing such a
control knob to live patching users and thus, all he/she can do about it is
to apply the somewhat coarse policy of "security over performance".
There are other examples for similar cases, which I forgot about now.

Long story short, it would be nice if we could, for certain subpatches
or "features" or however we call it, ask the user to enable them
explictly through some /sys/.../enable file.

Huh, this is another level of complexity.

Indeed :)

Ignore it, I just wanted to remark that static klp_features could have
the potential to get reused/extended in the future. But that should not
concern us now -- aplogies for the confusion.

<digression>

I wonder what are the problems to create a /sys interface that might
get trasfered between livepatches.

Ok, in the L1TF kvm example, there's none.

But there are cases where I'd prefer to introduce my own, live-patch
specific control knob not having any equivalent in upstream.  Take the
mincore() patches currently under discussion. They slightly change
behaviour and it would be nice imo if the users could opt-in for that.

So we would need a) persistent knowledge, ie livepatch core keeps a 
memory of features, or a shadow variable-style dictionary of "things" 
and b) persistent sysfs interface for this?

I wonder if we should have made the shadow variable API more generic, at 
least in name, so it could be used for this.

And it would be great, if this delayed enabling of the mincore()
live-subpatch could happen with full consistency-model guarantees:
writing '1' to
/sys/.../foo-patch/mincore-cve-feature/enable
would trigger a full transition for the sub-patch instead of only
flipping some boolean flag.

Would this second transition take any more or less time then the patch 
transition?  Just wondering if this could be implemented via module 
parameters and then incorporated as part of the usual livepatch 
transition/callback mechanisms.

Combining this idea with features kinda flips things on its head... 
instead of thinking of livepatches as providing a mere set of function 
replacements, one might think more of features, feature versions and 
feature transitions.  The livepatch module just a delivery mechanism.

But as said, it's only a nice to have and not important for the
discussion here.

Still an interesting idea and worth tossing around for debate.

This would allow to create the same interface that is used by the
upstream implementation.

I think that diverting from the upstream interface and collecting all
live-patch specific sysfs 'enable' files at a central location might
have the advantage that monitoring with Nagios (or whatever is in use
nowadays) would be straight forward.

</digression>

There would be advantages to both approaches, but in the end I would 
agree that collecting them under a livepatch namespace would be safer as 
we may not be able to ensure 1:1 semantics to their upstream counterparts.

-- Joe