On Tue 2022-07-26 20:26:41, Rik van Riel wrote: > On Tue, 2022-07-26 at 17:10 -0700, Josh Poimboeuf wrote: > > On Mon, Jul 25, 2022 at 09:49:19AM -0400, Rik van Riel wrote: > > > When a KLP fails to apply, klp_reverse_transition will clear the > > > TIF_PATCH_PENDING flag on all tasks, except for newly created tasks > > > which are not on the task list yet. > > > > This paragraph and $SUBJECT both talk about a reverse transition. > > Isn't > > it also possible to race on a normal (forward) transition? > > I don't know whether the race is also possible on a forward > transition. If the parent task has transitioned, will > the child have, as well, by the time we reach the end of fork? I think that the race should be possible also with the forward transition. I do not see what would prevent it. > I suppose the only way the parent task can transition while > inside fork would be if none of the functions in its stack > need to be transitioned, and at that point the child process > would automatically be safe, too? IMHO, these races might be dangerous only when fork() calls a function on the way out that is livepatched but it was not on the stack when the process was copied. Anyway, the patch should make sure that task->patch_state and TIF_PATCH_PENTING are always consitent when the child is added to the global task list. So, we should always be on the safe side. > However, we have only observed this warning on reverse transitions > for some reason. IMHO, it is because the race during the forward transition is kind of "self-healing": parent: worker: fork() #copy set TIF_PATCH_PENDING # schedule klp_try_complete_transition() clear_bit(parent, TIF_PATCH_PENDING); parent->patch_state = klp_target_state; # running again # copy already migrated parent->patch_state later: clear_bit(child, TIF_PATCH_PENDING); child->patch_state = klp_target_state; As a result, child->patch_state will be updated twice to klp_target_state. The problematic situation during revert: parent: another process: # migrate parent clear_bit(parent, TIF_PATCH_PENDING); parent->patch_state = klp_target_state; fork() #copy cleared TIF_PATCH_PENDING klp_revert_patch() # invert @klp_target_state set_bit(parent, TIF_PATCH_PENDING) # copy parent->patch_state that needs migration once again # migrated once again after revert clear_bit(parent, TIF_PATCH_PENDING); parent->patch_state = klp_target_state; WARNING: child will never get migrated because it copied the cleared TIF_PATCH_PENDING before @klp_target_state was inverted Resume: It is great that the race was found and fixed. Best Regards, Petr