On Wed 2021-10-06 11:04:26, Peter Zijlstra wrote: > On Wed, Oct 06, 2021 at 10:12:17AM +0200, Petr Mladek wrote: > > IMHO, the original solution from v1 was better. We only needed to > > It was also terribly broken in other 'fun' ways. See below. > > > be careful when updating task->patch_state and clearing > > TIF_PATCH_PENDING to avoid the race. > > > > The following might work: > > > > static int klp_check_and_switch_task(struct task_struct *task, void *arg) > > { > > int ret; > > > > /* > > * Stack is reliable only when the task is not running on any CPU, > > * except for the task running this code. > > */ > > if (task_curr(task) && task != current) { > > /* > > * This only succeeds when the task is in NOHZ_FULL user > > * mode. Such a task might be migrated immediately. We > > * only need to be careful to set task->patch_state before > > * clearing TIF_PATCH_PENDING so that the task migrates > > * itself when entring kernel in the meatime. > > */ > > if (is_ct_user(task)) { > > klp_update_patch_state(task); > > return 0; > > } > > > > return -EBUSY; > > } > > > > ret = klp_check_stack(task, arg); > > if (ret) > > return ret; > > > > /* > > * The task neither is running on any CPU and nor it can get > > * running. As a result, the ordering is not important and > > * barrier is not needed. > > */ > > task->patch_state = klp_target_state; > > clear_tsk_thread_flag(task, TIF_PATCH_PENDING); > > > > return 0; > > } > > > > , where is_ct_user(task) would return true when task is running in > > CONTEXT_USER. If I get the context_tracking API correctly then > > it might be implemeted the following way: > > That's not sufficient, you need to tag the remote task with a ct_work > item to also runs klp_update_patch_state(), otherwise the remote CPU can > enter kernel space between checking is_ct_user() and doing > klp_update_patch_state(): > > CPU0 CPU1 > > <user> > > if (is_ct_user()) // true > <kernel-entry> > // run some kernel code > klp_update_patch_state() > *WHOOPSIE* > > > So it needs to be something like: > > > CPU0 CPU1 > > <user> > > if (context_tracking_set_cpu_work(task_cpu(), CT_WORK_KLP)) > > <kernel-entry> > klp_update_patch_state klp_update_patch_state() > > > So that CPU0 and CPU1 race to complete klp_update_patch_state() *before* > any regular (!noinstr) code gets run. Grr, you are right. I thought that we migrated the task when entering kernel even before. But it seems that we do it only when leaving the kernel in exit_to_user_mode_loop(). > Which then means it needs to look something like: > > noinstr void klp_update_patch_state(struct task_struct *task) > { > struct thread_info *ti = task_thread_info(task); > > preempt_disable_notrace(); > if (arch_test_bit(TIF_PATCH_PENDING, (unsigned long *)&ti->flags)) { > /* > * Order loads of TIF_PATCH_PENDING vs klp_target_state. > * See klp_init_transition(). > */ > smp_rmb(); > task->patch_state = __READ_ONCE(klp_target_state); > /* > * Concurrent against self; must observe updated > * task->patch_state if !TIF_PATCH_PENDING. > */ > smp_mb__before_atomic(); IMHO, smp_wmb() should be enough. We are here only when this CPU set task->patch_state right above. So that CPU running this code should see the correct task->patch_state. The read barrier is needed only when @task is entering kernel and does not see TIF_PATCH_PENDING. It is handled by smp_rmb() in the "else" branch below. It is possible that both CPUs see TIF_PATCH_PENDING and both set task->patch_state. But it should not cause any harm because they set the same value. Unless something really crazy happens with the internal CPU busses and caches. > arch_clear_bit(TIF_PATCH_PENDING, (unsigned long *)&ti->flags); > } else { > /* > * Concurrent against self, see smp_mb__before_atomic() > * above. > */ > smp_rmb(); Yeah, this is the counter part against the above smp_wmb(). > } > preempt_enable_notrace(); > } Now, I am scared to increase my paranoia level and search for even more possible races. I feel overwhelmed at the moment ;-) Best Regards, Petr