On Thu, 14 Apr 2016, Josh Poimboeuf wrote: > On Thu, Apr 14, 2016 at 11:25:18AM +0200, Miroslav Benes wrote: > > On Fri, 25 Mar 2016, Josh Poimboeuf wrote: > > > > > +/* > > > + * klp_update_task_universe() - change the patched state of a task > > > + * @task: The task to change > > > + * > > > + * Converts the patched state of the task so that it will switch to the set of > > > + * functions in the goal universe. > > > + */ > > > +static inline void klp_update_task_universe(struct task_struct *task) > > > +{ > > > + /* > > > + * The corresponding write barriers are in klp_init_transition() and > > > + * klp_start_transition(). See the comments there for an explanation. > > > + */ > > > + smp_rmb(); > > > + > > > + task->klp_universe = klp_universe_goal; > > > +} > > > > I wonder whether we should introduce a static_key for this function. > > klp_update_task_universe() is called in some important code paths and it > > is called everytime if CONFIG_LIVEPATCH is on. It does not matter much for > > syscall paths, because we enter slowpath there only if TIF_KLP_NEED_UPDATE > > is set and that is set iff patching is in progress. But we call the > > function also elsewhere. So maybe it would be nice to introduce static_key > > in the function which would be on if the patching is in progress and off > > if not. > > > > Maybe it is just an overoptimization though. > > klp_update_task_universe() is called from: > > - exit_to_usermode_loop() (slow path for syscall/irq exit) > - copy_process() (though I'll be changing this code so children inherit > parents' flags) > - init_idle() > - cpu_idle_loop() > - various places in livepatch code > > I think none of those are fast paths, so my feeling is that a static key > would be overkill. Agreed. > > > +static bool klp_try_switch_task(struct task_struct *task) > > > +{ > > > + struct rq *rq; > > > + unsigned long flags; > > > + int ret; > > > + bool success = false; > > > + > > > + /* check if this task has already switched over */ > > > + if (task->klp_universe == klp_universe_goal) > > > + return true; > > > > [...] > > > > > +/* > > > + * This function can be called in the middle of an existing transition to > > > + * reverse the direction of the universe goal. This can be done to effectively > > > + * cancel an existing enable or disable operation if there are any tasks which > > > + * are stuck in the starting universe. > > > + */ > > > +void klp_reverse_transition(void) > > > +{ > > > + struct klp_patch *patch = klp_transition_patch; > > > + > > > + klp_start_transition(!klp_universe_goal); > > > + klp_try_complete_transition(); > > > + > > > + patch->enabled = !patch->enabled; > > > +} > > > > This function could be called iff the patching is in progress, there are > > some not-yet-migrated tasks and we wait for scheduled workqueue to run > > klp_try_complete_transition() again, right? I have two questions. > > > > 1. Is the call to klp_try_complete_transition() here necessary? It would > > be called via workqueue, wouldn't it? I suspect we don't gain much with > > this and we introduce possible future problems because of "double > > success" of klp_try_complete_transition() (nothing serious there as of > > now though). But I might be wrong because I got really confused by this > > piece of code and its context :) > > Yeah, the call to klp_try_complete_transition() isn't necessary because > of the workqueue. It's still kind of nice to have though, because it > tries to transition immediately rather than waiting for the work (and we > might end up deciding to change the delayed work frequency to be less > often than once per second). Yes, there are potential benefits with lower frequency. > I can't really envision that future bugs related to "double success" > would be much to worry about. The work function does check for > klp_transition_patch beforehand, and if it didn't, > klp_complete_transition() would die loudly with a NULL pointer > dereference. Of course, it's always hard to anticipate future code > changes and I could be completely wrong. Ah, right. There's a check and we clear klp_transition_patch in klp_try_complete_transition() (I always forget there is a call to klp_complete_transition "hidden" there :)). > So I would vote to keep this call, but I don't feel too strongly about > it since it isn't strictly needed. Yeah, let's leave it as it is because there is no real harm. > > 2. klp_reverse_transition() works well because not-yet-migrated tasks are > > still in KLP_UNIVERSE_OLD (without loss of generality) and since > > klp_universe_goal is swapped they are in fact in their target universe. > > klp_try_switch_task() returns immediately in such case. The rest is > > migrated in an ordinary way. > > > > What about TIF_KLP_NEED_UPDATE (I know it is introduced in later patch but > > it is easier to discuss it here)? klp_start_transition() resets the flag > > for all the task. Shouldn't the flag be cleared for the task in > > klp_try_switch_task() if task->klp_universe == klp_universe_goal? In fact > > it does not matter if it is set. The patching success is determined by > > klp_func->klp_universe booleans. But those tasks would needlessly go > > through syscall slowpaths (see note above about static_key) which could > > hurt performance. > > > > Does this make sense? > > I agree with you, but for a different reason :-) > > And actually there's another way this could happen: if a task is forked > during klp_start_transition() after setting the universe goal but before > setting the thread flags, it would already be transitioned but its flag > would get unnecessarily set. (Though as I mentioned above, the > copy_process() code will be changed anyway, so that should no longer be > an issue). > > But I don't think it's really a performance issue, because: > > 1) Assuming a reliable stack, in most cases the majority of tasks are > transitioned immediately. So when reversing the transition, there > would be only a few tasks that would get the flag set unnecessarily. Correct. Majority of tasks would get the flag cleared before they even get a change to go through the syscall gate. > 2) For those tasks which do have the flag set unnecessarily, only the > next syscall will do the slow path. After that the flag gets > cleared. So it only slows down a single syscall for each task. At a > macro level, that's just a drop in the bucket for performance and, > IMO, not worth worrying about and adding (potentially buggy) code > for. Makes sense. > However, I can think of another reason setting the flag unnecessarily > might be bad: it's just conceptually confusing and could maybe lead to > bugs. (ok, here we go trying to predict the future again!) > > Specifically, after klp_complete_transition(), a reasonable livepatch > developer (is that an oxymoron? ;-)) might assume that all tasks' > TIF_KLP_NEED_UPDATE flags are cleared and might write code which makes > that assumption. > > And, even with today's code it's kind of hard to convince myself that > having a spurious TIF_KLP_NEED_UPDATE flag set from a previous patching > operation won't cause issues on the next patch. > > So maybe we should do as you suggest and clear the flag in > klp_try_switch_task() if the task is already in the goal universe. > > Or if we wanted to be paranoid we could just clear all the flags in > klp_complete_transition(), but maybe that's a little too sloppy/lazy? I'll leave it up to you, because I am not able to decide that. Both options seem good to me. The former is somewhat more appealing, the latter is a safe bet. Thanks, Miroslav -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html