On Thu, May 05, 2016 at 01:57:01PM +0200, Petr Mladek wrote: > I have missed that the two commands are called with preemption > disabled. So, I had the following crazy scenario in mind: > > > CPU0 CPU1 > > klp_enable_patch() > > klp_target_state = KLP_PATCHED; > > for_each_task() > set TIF_PENDING_PATCH > > # task 123 > > if (klp_patch_pending(current) > klp_patch_task(current) > > clear TIF_PENDING_PATCH > > smp_rmb(); > > # switch to assembly of > # klp_patch_task() > > mov klp_target_state, %r12 > > # interrupt and schedule > # another task > > > klp_reverse_transition(); > > klp_target_state = KLP_UNPATCHED; > > klt_try_to_complete_transition() > > task = 123; > if (task->patch_state == klp_target_state; > return 0; > > => task 123 is in target state and does > not block conversion > > klp_complete_transition() > > > # disable previous patch on the stack > klp_disable_patch(); > > klp_target_state = KLP_UNPATCHED; > > > # task 123 gets scheduled again > lea %r12, task->patch_state > > => it happily stores an outdated > state > Thanks for the clear explanation, this helps a lot. > This is why the two functions should get called with preemption > disabled. We should document it at least. I imagine that we will > use them later also in another context and nobody will remember > this crazy scenario. > > Well, even disabled preemption does not help. The process on > CPU1 might be also interrupted by an NMI and do some long > printk in it. > > IMHO, the only safe approach is to call klp_patch_task() > only for "current" on a safe place. Then this race is harmless. > The switch happen on a safe place, so that it does not matter > into which state the process is switched. I'm not sure about this solution. When klp_complete_transition() is called, we need all tasks to be patched, for good. We don't want any of them to randomly switch to the wrong state at some later time in the middle of a future patch operation. How would changing klp_patch_task() to only use "current" prevent that? > By other words, the task state might be updated only > > + by the task itself on a safe place > + by other task when the updated on is sleeping on a safe place > > This should be well documented and the API should help to avoid > a misuse. I think we could fix it to be safe for future callers who might not have preemption disabled with a couple of changes to klp_patch_task(): disabling preemption and testing/clearing the TIF_PATCH_PENDING flag before changing the patch state: void klp_patch_task(struct task_struct *task) { preempt_disable(); if (test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING)) task->patch_state = READ_ONCE(klp_target_state); preempt_enable(); } We would also need a synchronize_sched() after the patching is complete, either at the end of klp_try_complete_transition() or in klp_complete_transition(). That would make sure that all existing calls to klp_patch_task() are done. -- Josh -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html