Re: livepatch callback features

Nicolai Stange <nstange@xxxxxxx> · Thu, 21 Feb 2019 08:34:31 +0100

Joe Lawrence <joe.lawrence@xxxxxxxxxx> writes:

> On 02/15/2019 05:58 AM, Nicolai Stange wrote:
>> Petr Mladek <pmladek@xxxxxxxx> writes:
>>> On Thu 2019-02-14 11:47:42, Nicolai Stange wrote:
>>>> Joe Lawrence <joe.lawrence@xxxxxxxxxx> writes:
>>>
>>> I see the following advantages of the dynamic approach:
>>>
>>>       + easy to transfer (it just stays in the memory, some list)
>>>
>>>       + Localized code (alloc/get only where needed, no need to
>>>         define it in some global structures, lists).
>>>
>>>       + One structure for a single feature all the time. No confusion
>>>         which structure you work with. Temporary states might always
>>>         be stored in local static variables and stored into
>>>         the structure once the feature is fully migrated.
>>>
>>>         I could imagine that some people might see this as
>>>         a disadvantage though.
>
> + Clearer sysfs entry, if we choose to expose it, since it would be
> decoupled from any specific livepatch lifetime and could live under
> /sys/.../livepatch/features/
>
>>> The disadvantages are:
>>>
>>>      + allocation might fail (must be done in pre_patch())
>
> Or module .init?
>
>>>      + more complicated compatibility check (another callback
>>>        to check provided features)
>
> Ditto?

I think the feature advertisement should be coupled to the klp_patch
structure for clarity, be it through an additional callback or however.

These kinds of checks done from .init tend to become racy, because IIRC,
two live patch modules can be loaded concurrently.

>>>>> Features could still be stored in a list and the callbacks could get
>>>>> access to current vs. this patch's feature set.  They wouldn't be
>>>>> dynamic, but then the API supporting that could drop out.
>>>
>>>> Right, so during a transition, access to klp_features from the previous
>>>> live patch would be possible only up to and including the pre-patch
>>>> callback, correct? That would probably be enough to "take over" in all
>>>> situations. If the next live patch is sticky, ->data ownership wouldn't
>>>> be a problem, because, as Petr said in reply to the stickiness patch,
>>>> the transition should be non-reversable in this case. If we're dealing
>>>> with non-sticky live patches, the post-unpatch callback would only get
>>>> called if the live patch hasn't been superseeded by another one (thanks
>>>> to the atomic replace patchset) -- so we could teardown any ->data from
>>>> there.
>>>
>>> Anyway, this is another motivation to remove object pointer
>>> from the feature structure. By other words, the dynamic structure
>>> should not point to any static code that might get later removed
>>> (because it is error prone).
>>
>> Yes, makes sense. 
>
> Agreed here.
>
>>>>> - For follow on livepatches, would their feature set need to match or
>>>>> exceed the values for all of the current features, or only the
>>>>> features that the new patch is advertising to support?  For example,
>>>>> if we have current features:
>>>>>
>>>>>    vmlinux : { { L1TF, v1 }, { CVE2, v2 } }
>>>>>
>>>>> should the incoming cumulative patch need L1TF >= v1 and CVE2 >= v2 ?
>>>>> Or does it only need L1TF >= v1 *or* CVE2 >= v2?
>>>>>
>>>>> Dunno if cumulative livepatches change that requirement or not.
>>>>
>>>> I have to revisit the semantics of non-cumulative live patches
>>>> after atomic replace, but I think that if the one to get applied is
>>>> cumulative, it's an "and" and for those which are non-cumulative it
>>>> could perhaps be an "or".
>>>
>>> Good question. Well, I think that non-cumulative patches will not
>>> need to check anything. They are primary used to add unrelated fixes.
>>> They might theoretically take over or extend an existing feature
>>> but then the authors/users is on their own shifting sands.
>
> It may get confusing, but someone may want a set of features that
> handle scheduler fixes and an orthogonal set that touch a device
> driver.  In that case, non-cumulative patches may want to handle these
> separately.
>
>> Hmm, I think we could require the versions to be >= what is currently
>> there, but only for the features the new non-cumulative live patch
>> supports?
>>
>> I.e, for cumulative live patches, it would be: "all features currently
>> registered must be provided by a new cumulative live patch in the same
>> or higher version".
>>
>> And for non-cumulative ones: "all features supported by a new
>> non-cumulative live patch must be provided in the same or higher version
>> than what's currently registered on the system".
>
> Yeah I think that those would be the two major use cases we could
> choose to support.  As I mentioned before, I'd be curious to see how
> the non-cumulative model scales after many livepatches are loaded.
>
>>>> Which brings another question: similar to the "sticky" case preventing
>>>> unpatching, we probably shouldn't allow the the transition to get
>>>> reversed back to live patches not providing a particular feature at all
>>>> or only in a lower version?
>>>>
>>>> Perhaps it would make sense to make stickiness a klp_feature property?
>>>
>>> It might be unclear with patch is sticky with the dynamically
>>> allocated structures. It might work with dynamically allocated
>>> structures.
>>>
>>> The question is if it is worth the complexity. It is a global
>>> state. I wonder if any callback would ever want to disable
>>> the sticky flag once set.
>>>
>>
>> I don't care too much whether or not the stickiness is managed per-patch
>> or or'ed together from individual features.
>>
>> But I think what is important is how the feature versioning affects
>> the reversability of transitions.
>>
>> That is, as we won't allow a transision to a sticky patch to be
>> reversed, we probably also don't want to allow a transition to a patch
>> with a feature of a higher version to be reversed -- just to be
>> consistent there?
>
> I had a question about the reversability of sticky patches in my
> Friday reply.  If we choose to disallow that, then it might make to
> extend the same policy to (higher) features.  But I had a question
> about callback execution with respect to reversal in that other mail
> I'd like to sort out first.

For completeness: Petr proposed to introduce some klp_disable_revert()
or alike.

FWIW, I think that disallowing downgrades would be much more common than
disallowing reverts. Because for reverts, the live patch author could
most often roll back anything from the pre_unpatch() callback.

So, if I were asked to unify the version- and unpatch-"stickiness", I'd
come up with

enum klp_feature_stickiness
{
  kfs_none, // No stickiness, allow all transitions
  kfs_dowgrade, // Disallow downgrades to lower versions
  kfs_unpatch, // Disallow unpatch transitions
};

where later members enforce previous ones. Or can anybody think of a use
case where unpatching would not be allowed, but downgrades are?

>
>>>> Let me throw in a half-baked idea/dream of mine completely unrelated to
>>>> changing global semantics, safe upgrade paths and so on.
>>>>
>>>> >From time to time, the upstream mitigation for a certain CVE involves
>>>> new control knobs. For example, in the L1TF kvm case, there had been a
>>>> new option (whose name I forgot) to disable hyperthreading. The live
>>>> patch author currently has no standardized way of providing such a
>>>> control knob to live patching users and thus, all he/she can do about it is
>>>> to apply the somewhat coarse policy of "security over performance".
>>>> There are other examples for similar cases, which I forgot about now.
>>>>
>>>> Long story short, it would be nice if we could, for certain subpatches
>>>> or "features" or however we call it, ask the user to enable them
>>>> explictly through some /sys/.../enable file.
>>>
>>> Huh, this is another level of complexity.
>>
>> Indeed :)
>>
>> Ignore it, I just wanted to remark that static klp_features could have
>> the potential to get reused/extended in the future. But that should not
>> concern us now -- aplogies for the confusion.
>>
>> <digression>
>>
>>> I wonder what are the problems to create a /sys interface that might
>>> get trasfered between livepatches.
>>
>> Ok, in the L1TF kvm example, there's none.
>>
>> But there are cases where I'd prefer to introduce my own, live-patch
>> specific control knob not having any equivalent in upstream.  Take the
>> mincore() patches currently under discussion. They slightly change
>> behaviour and it would be nice imo if the users could opt-in for that.
>
> So we would need a) persistent knowledge, ie livepatch core keeps a
> memory of features, or a shadow variable-style dictionary of "things"
> and b) persistent sysfs interface for this?
>
> I wonder if we should have made the shadow variable API more generic,
> at least in name, so it could be used for this.
>
>> And it would be great, if this delayed enabling of the mincore()
>> live-subpatch could happen with full consistency-model guarantees:
>> writing '1' to
>> /sys/.../foo-patch/mincore-cve-feature/enable
>> would trigger a full transition for the sub-patch instead of only
>> flipping some boolean flag.
>
> Would this second transition take any more or less time then the patch
> transition?  Just wondering if this could be implemented via module
> parameters and then incorporated as part of the usual livepatch
> transition/callback mechanisms.
>
> Combining this idea with features kinda flips things on its
> head... instead of thinking of livepatches as providing a mere set of
> function replacements, one might think more of features, feature
> versions and feature transitions.  The livepatch module just a
> delivery mechanism.

That's exactly what I had in my mind. But anyways, people apparently
don't like it much and there's probably no point in discussing it any
further...

Thanks!

Nicolai

>
>> But as said, it's only a nice to have and not important for the
>> discussion here.
>
> Still an interesting idea and worth tossing around for debate.
>
>>> This would allow to create the same interface that is used by the
>>> upstream implementation.
>>
>> I think that diverting from the upstream interface and collecting all
>> live-patch specific sysfs 'enable' files at a central location might
>> have the advantage that monitoring with Nagios (or whatever is in use
>> nowadays) would be straight forward.
>>
>> </digression>
>
> There would be advantages to both approaches, but in the end I would
> agree that collecting them under a livepatch namespace would be safer
> as we may not be able to ensure 1:1 semantics to their upstream
> counterparts.
>
> -- Joe
>

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)