Re: livepatch callback features

Nicolai Stange <nstange@xxxxxxx> · Fri, 15 Feb 2019 11:58:33 +0100

Petr Mladek <pmladek@xxxxxxxx> writes:

> On Thu 2019-02-14 11:47:42, Nicolai Stange wrote:
>> Joe Lawrence <joe.lawrence@xxxxxxxxxx> writes:
>> > - Do features need to be tightly coupled to callbacks?  They are a
>> > convenient tool to implement a policy, but I wonder if we lose
>> > anything by moving this up to the klp_patch level and letting the core
>> > handle it. (Like the versioning scheme.)
>> 
>> If I'm understanding you correctly, you wonder whether having a
>> klp_object member in klp_feature would be necessary or not, right?
>> 
>> I think it doesn't matter much in these "safe upgrade" scenarios
>> being discussed here.
>
> The motivation to store the object pointer was to get an easy access
> to the callbacks from the old patch. Heh, I did this when I thought
> about the API on the Plumbers conference and do not remember the use
> case any longer.
>
> Hmm, it might allow to obsolete a feature using pre_unpatch()
> and post_unpatch() callbacks from the old patch in pre-patch()
> and post_patch() callback of the new cumulative patch.
>
> Now, I think that it would actually complicate to access the same
> feature from different objects.

Ah yes, now I remember. But I think this (== thinking about invoking the
previous patch's callback from the next's ones) was before we decided to
simply disallow transitions to incompatible patches, right?

>> I have to admit though that I don't see the need for having the
>> klp_features dynamic or "shadow variable like": I think we could embed
>> those statically into klp_patch as well.
>
> I see the following advantages of the dynamic approach:
>
>      + easy to transfer (it just stays in the memory, some list)
>
>      + Localized code (alloc/get only where needed, no need to
>        define it in some global structures, lists).
>
>      + One structure for a single feature all the time. No confusion
>        which structure you work with. Temporary states might always
>        be stored in local static variables and stored into
>        the structure once the feature is fully migrated.
>
>        I could imagine that some people might see this as
>        a disadvantage though.
>
>
>
> The disadvantages are:
>
>     + allocation might fail (must be done in pre_patch())
>
>     + more complicated compatibility check (another callback
>       to check provided features)

Ok. The only thing which remains unclear to me is how a live patch
announces the set of supported features to the klp core?

>> > Features could still be stored in a list and the callbacks could get
>> > access to current vs. this patch's feature set.  They wouldn't be
>> > dynamic, but then the API supporting that could drop out.
>
>> Right, so during a transition, access to klp_features from the previous
>> live patch would be possible only up to and including the pre-patch
>> callback, correct? That would probably be enough to "take over" in all
>> situations. If the next live patch is sticky, ->data ownership wouldn't
>> be a problem, because, as Petr said in reply to the stickiness patch,
>> the transition should be non-reversable in this case. If we're dealing
>> with non-sticky live patches, the post-unpatch callback would only get
>> called if the live patch hasn't been superseeded by another one (thanks
>> to the atomic replace patchset) -- so we could teardown any ->data from
>> there.
>
> Anyway, this is another motivation to remove object pointer
> from the feature structure. By other words, the dynamic structure
> should not point to any static code that might get later removed
> (because it is error prone).

Yes, makes sense.

>> > - For follow on livepatches, would their feature set need to match or
>> > exceed the values for all of the current features, or only the
>> > features that the new patch is advertising to support?  For example,
>> > if we have current features:
>> >
>> >   vmlinux : { { L1TF, v1 }, { CVE2, v2 } }
>> >
>> > should the incoming cumulative patch need L1TF >= v1 and CVE2 >= v2 ?
>> > Or does it only need L1TF >= v1 *or* CVE2 >= v2?
>> >
>> > Dunno if cumulative livepatches change that requirement or not.
>> 
>> I have to revisit the semantics of non-cumulative live patches
>> after atomic replace, but I think that if the one to get applied is
>> cumulative, it's an "and" and for those which are non-cumulative it
>> could perhaps be an "or".
>
> Good question. Well, I think that non-cumulative patches will not
> need to check anything. They are primary used to add unrelated fixes.
> They might theoretically take over or extend an existing feature
> but then the authors/users is on their own shifting sands.

Hmm, I think we could require the versions to be >= what is currently
there, but only for the features the new non-cumulative live patch
supports?

I.e, for cumulative live patches, it would be: "all features currently
registered must be provided by a new cumulative live patch in the same
or higher version".

And for non-cumulative ones: "all features supported by a new
non-cumulative live patch must be provided in the same or higher version
than what's currently registered on the system".

>> Which brings another question: similar to the "sticky" case preventing
>> unpatching, we probably shouldn't allow the the transition to get
>> reversed back to live patches not providing a particular feature at all
>> or only in a lower version?
>> 
>> Perhaps it would make sense to make stickiness a klp_feature property?
>
> It might be unclear with patch is sticky with the dynamically
> allocated structures. It might work with dynamically allocated
> structures.
>
> The question is if it is worth the complexity. It is a global
> state. I wonder if any callback would ever want to disable
> the sticky flag once set.
>

I don't care too much whether or not the stickiness is managed per-patch
or or'ed together from individual features.

But I think what is important is how the feature versioning affects
the reversability of transitions.

That is, as we won't allow a transision to a sticky patch to be
reversed, we probably also don't want to allow a transition to a patch
with a feature of a higher version to be reversed -- just to be
consistent there?

>> Let me throw in a half-baked idea/dream of mine completely unrelated to
>> changing global semantics, safe upgrade paths and so on.
>> 
>> >From time to time, the upstream mitigation for a certain CVE involves
>> new control knobs. For example, in the L1TF kvm case, there had been a
>> new option (whose name I forgot) to disable hyperthreading. The live
>> patch author currently has no standardized way of providing such a
>> control knob to live patching users and thus, all he/she can do about it is
>> to apply the somewhat coarse policy of "security over performance".
>> There are other examples for similar cases, which I forgot about now.
>> 
>> Long story short, it would be nice if we could, for certain subpatches
>> or "features" or however we call it, ask the user to enable them
>> explictly through some /sys/.../enable file.
>
> Huh, this is another level of complexity.

Indeed :)

Ignore it, I just wanted to remark that static klp_features could have
the potential to get reused/extended in the future. But that should not
concern us now -- aplogies for the confusion.

<digression>

> I wonder what are the problems to create a /sys interface that might
> get trasfered between livepatches.

Ok, in the L1TF kvm example, there's none.

But there are cases where I'd prefer to introduce my own, live-patch
specific control knob not having any equivalent in upstream.  Take the
mincore() patches currently under discussion. They slightly change
behaviour and it would be nice imo if the users could opt-in for that.

And it would be great, if this delayed enabling of the mincore()
live-subpatch could happen with full consistency-model guarantees:
writing '1' to
/sys/.../foo-patch/mincore-cve-feature/enable
would trigger a full transition for the sub-patch instead of only
flipping some boolean flag.

But as said, it's only a nice to have and not important for the
discussion here.

> This would allow to create the same interface that is used by the
> upstream implementation.

I think that diverting from the upstream interface and collecting all
live-patch specific sysfs 'enable' files at a central location might
have the advantage that monitoring with Nagios (or whatever is in use
nowadays) would be straight forward.

</digression>

Thanks,

Nicolai