Re: livepatch callback features

Nicolai Stange <nstange@xxxxxxx> · Thu, 14 Feb 2019 11:47:42 +0100

Joe Lawrence <joe.lawrence@xxxxxxxxxx> writes:

> On 2/13/19 5:12 AM, Petr Mladek wrote:
>> My idea is to create something similar to the shadow variables API.
>> Something like:
>>
>> struct klp_feature {
>>         /* feature ID, e.g. #define L1TF 1 */
>>         int id;
>>         /* Version of the code */
>>         int version;
>>         /* object + callbacks assiciated with the feature */
>>         struct klp_object *obj;
>>         /* optional data */
>>         void *data;
>>         /* list of available features */
>>         struct list_head node;
>> */
>>
>>
>> Note that it would be possible to add id, version, data to
>> struct klp_callbacks.
>>
>> But more practical would be to create it dynamically.
>> Then one callback might support more features. Also
>> It is less error-prone for porting because everyhing
>> is self-contained in the callback code. Finally,
>> it makes it easier to take over the state by another
>> cummulative livepatch.
>>
>>
>> Now the callbacks:
>>
>> struct klp_feature *klp_feature_alloc(id, version, obj, data);
>> void klp_feature_free(feature);
>>
>> struct klp_feature *klp_feature_get(id);
>> struct klp_feature *klp_feature_get(id);
>>
>>
>> Then the callback might do something like:
>>
>> #define L1TF 1
>> #define L1TF_VERSION 1
>>
>>
>> int klp_pre_patch_callback_l1tf(struct klp_object *obj)
>> {
>> 	struct klp_feature *feature;
>>
>> 	if (klp_feature_get(L1TF))
>> 		return 0;
>>
>> 	/* this is the first patch doing L1TF */
>> 	feature = klp_feature_alloc(L1TF, L1TF_VERSION, obj, NULL);
>> 	if (!feature)
>> 		return -ENOMEM;
>> }
>>
>> void klp_post_callback_l1tf(struct klp_object *obj)
>> {
>> 	struct klp_feature *feature;
>>
>> 	feature = klp_get_feature(L1TF);
>> 	if (!feature) {
>> 		WARN("This should never happen\n);
>>
>> 	if (feature->obj != obj && old_feature->version == L1TF_VERSION)
>> 	{
>> 		/* just take over */
>> 		old_feature->obj = obj;
>> 		return;
>> 	}
>>
>> 	# enable L1TF mitigation
>> 	l1tf_enable();
>> }
>>
>> void klp_pre_unpatch_callback(struct klp_object *obj)
>> {
>> 	/* Disable l1tf mitigating, revert all changes */
>> 	l1tf_disable();
>> }
>>
>> void klp_post_unpatch_callback(struct klp_object *obj)
>> {
>> 	struct klp_feature *feature;
>>
>> 	feature = klp_get_feature(L1TF);
>> 	if (!feature) {
>> 		WARN("This should never happen\n);
>>
>> 	if (feature->obj != obj && feature->version == L1TF_VERSION) {
>> 		/* This is revert, previous patch keeps using it */
>> 		return;
>> 	}
>>
>> 	klp_feature_free(feature);
>> }
>>
>>
>> Finally, the question is how to check the compatibility within update.
>> In this case, it might be easier if version, id is part of
>> struct klp_callbacks.
>>
>> Alternative solution would be to add another callback:
>>
>> 	    int klp_callback_feature_version(id, version);
>>
>> Then we could do:
>>
>>      /* all features supported */
>>      bool compatible = true;
>>
>>      for_each_allocated_feature(feature) {
>> 	bool found_compat = false;
>> 	for_each_object() {
>> 	    int version;
>> 	    version = obj->callbacks->feature_version(feature->id);
>> 		if (version >= feature->version) {
>> 			found_compat == true;
>> 			break;
>> 		}
>> 	   }
>> 	}
>>
>> 	if (!found_compat) {
>> 		compatible = false;
>> 		break;
>> 	}
>>
>>
>> It is more complicated than the global version. But it is not
>> a rocket since either.
>>
>> Now, the question is how much you like it.
>
> I see where this is going and off the top of my head, I think it would
> address the L1TF upgrade/downgrade safety concerns.
>
> If you prefer, we can revist this when there is code to review, but
> here my initial questions and impressions:
>
> - Do features need to be tightly coupled to callbacks?  They are a
> convenient tool to implement a policy, but I wonder if we lose
> anything by moving this up to the klp_patch level and letting the core
> handle it. (Like the versioning scheme.)

If I'm understanding you correctly, you wonder whether having a
klp_object member in klp_feature would be necessary or not, right?

I think it doesn't matter much in these "safe upgrade" scenarios
being discussed here.

I have to admit though that I don't see the need for having the
klp_features dynamic or "shadow variable like": I think we could embed
those statically into klp_patch as well.

> Features could still be stored in a list and the callbacks could get
> access to current vs. this patch's feature set.  They wouldn't be
> dynamic, but then the API supporting that could drop out.

Right, so during a transition, access to klp_features from the previous
live patch would be possible only up to and including the pre-patch
callback, correct? That would probably be enough to "take over" in all
situations. If the next live patch is sticky, ->data ownership wouldn't
be a problem, because, as Petr said in reply to the stickiness patch,
the transition should be non-reversable in this case. If we're dealing
with non-sticky live patches, the post-unpatch callback would only get
called if the live patch hasn't been superseeded by another one (thanks
to the atomic replace patchset) -- so we could teardown any ->data from
there.

> - Perhaps the features-to-callback connection in my mind is too
> strong. It makes it sound like the callbacks themselves have provided
> a feature. It is possible that patched routines are the unsafe
> culprits while callbacks only a mechanism to enforce safety.
> I guess this is why I think to abstract features out to the patch
> level.
>
> - One more point on the abstraction: pulling this out of the callbacks
> may allow one to understand livepatch features without the details of
> the callbacks.  Not sure if that's a good or bad thing ...
>
> - For follow on livepatches, would their feature set need to match or
> exceed the values for all of the current features, or only the
> features that the new patch is advertising to support?  For example,
> if we have current features:
>
>   vmlinux : { { L1TF, v1 }, { CVE2, v2 } }
>
> should the incoming cumulative patch need L1TF >= v1 and CVE2 >= v2 ?
> Or does it only need L1TF >= v1 *or* CVE2 >= v2?
>
> Dunno if cumulative livepatches change that requirement or not.

I have to revisit the semantics of non-cumulative live patches
after atomic replace, but I think that if the one to get applied is
cumulative, it's an "and" and for those which are non-cumulative it
could perhaps be an "or".

Which brings another question: similar to the "sticky" case preventing
unpatching, we probably shouldn't allow the the transition to get
reversed back to live patches not providing a particular feature at all
or only in a lower version?

In reply to the stickiness-patch, Petr proposed to introduce a

  klp_set_patch_sticky(obj)

by which a live patch can control the exact point (before register,
pre-patch or post-patch) from which on it becomes unsafe to revert a
live patch.

Perhaps it would make sense to make stickiness a klp_feature property?

I.e. klp_set_feature_stick(feat) which would
- prevent future patch disabling
- and transitions back to live patches not having that feature or only
  in a lower version

?

> - Wacky idea: can features be deprecated?  A livepatch neuters the
> unsafe condition, so future patches can now safely ignore it.
>

What would be the usage scenario? That the live patch author suddenly
discovers a brilliant idea to make everything safe, but only after
he/she has released some unsafe implementation already?

Or are you thinking more of releasing a dummy live patch for turning the
system back into the original, "safe" state?

> Sorry for all the thinking out loud, but I figured I would share my
> frame of mind after coming out of the version/sticky RFC code.

This discussion is actually quite fruitful, imo.

Let me throw in a half-baked idea/dream of mine completely unrelated to
changing global semantics, safe upgrade paths and so on.

>From time to time, the upstream mitigation for a certain CVE involves
new control knobs. For example, in the L1TF kvm case, there had been a
new option (whose name I forgot) to disable hyperthreading. The live
patch author currently has no standardized way of providing such a
control knob to live patching users and thus, all he/she can do about it is
to apply the somewhat coarse policy of "security over performance".
There are other examples for similar cases, which I forgot about now.

Long story short, it would be nice if we could, for certain subpatches
or "features" or however we call it, ask the user to enable them
explictly through some /sys/.../enable file.

I do realize that this is something for the far future, if at all. My
point is that if this

>> struct klp_feature {
>>         /* feature ID, e.g. #define L1TF 1 */
>>         int id;
>>         /* Version of the code */
>>         int version;
>>         /* object + callbacks assiciated with the feature */
>>         struct klp_object *obj;
>>         /* optional data */
>>         void *data;
>>         /* list of available features */
>>         struct list_head node;

+

           const char *name;
           const char *reference_url;
           bool enabled_default;
           bool enabled;
           bool sticky;

would get embedded into the klp_patch structure statically, then we
would have almost everything in place from a data structure POV.

Somewhere in the far future, a klp_patch would not directly reference a
list of klp_objects anymore, but a list of klp_features which in turn
would then enumerate the klp_objects. (It would be necessary that
multiple klp_feature can have a klp_object for patching the same .ko or
vmlinux though).

Transition semantics would be as follows:
- patch has some sticky klp_feature enabled -> no klp_patch disabling
  allowed
- patch has some sticky klp_feature enabled -> that feature cannot get
  disabled (through that new /sys/.../foo-patch/some-feature/enable
  file)
- Replacement by a new cumulative patch is only possible, if
  for all enabled + sticky features in the current patch, the next
  patch provides it at a version >= the current one.
  All matching features of the next patch inherit the stickiness
  + enabled state of the previous one.
- Reversing the transition is only possible if all sticky + enabled
  features are provided by the previous patch at the same version.

Of course, I'm aware that all of this needs discussion and would be a
lot of work to implement. But what I'm saying is that if we decided to
embed these klp_features statically into klp_patch, then there is some
potential for reusing them in the future. Perhaps exposing the name +
sticky info through /sys/ would even make sense at this point already
for the sake of diagnostics.

Nicolai