> > That's not sufficient, you need to tag the remote task with a ct_work > > item to also runs klp_update_patch_state(), otherwise the remote CPU can > > enter kernel space between checking is_ct_user() and doing > > klp_update_patch_state(): > > > > CPU0 CPU1 > > > > <user> > > > > if (is_ct_user()) // true > > <kernel-entry> > > // run some kernel code > > klp_update_patch_state() > > *WHOOPSIE* > > > > > > So it needs to be something like: > > > > > > CPU0 CPU1 > > > > <user> > > > > if (context_tracking_set_cpu_work(task_cpu(), CT_WORK_KLP)) > > > > <kernel-entry> > > klp_update_patch_state klp_update_patch_state() > > > > > > So that CPU0 and CPU1 race to complete klp_update_patch_state() *before* > > any regular (!noinstr) code gets run. > > Grr, you are right. I thought that we migrated the task when entering > kernel even before. But it seems that we do it only when leaving > the kernel in exit_to_user_mode_loop(). That is correct. You are probably confused by the old kGraft implementation which added the TIF also to _TIF_WORK_SYSCALL_ENTRY so that syscall_trace_enter() processed it too. But upstream solution has always switched tasks only on their exit to user mode, because it was deemed sufficient. Btw, just for fun, the old kGraft in one of its first incarnations also had the following horrible hack to exactly "solve" the problem. +/* + * Tasks which are running in userspace after the patching has been started + * can immediately be marked as migrated to the new universe. + * + * If this function returns non-zero (i.e. also when error happens), the task + * needs to be migrated using kgraft lazy mechanism. + */ +static inline bool kgr_needs_lazy_migration(struct task_struct *p) +{ + unsigned long s[3]; + struct stack_trace t = { + .nr_entries = 0, + .skip = 0, + .max_entries = 3, + .entries = s, + }; + + save_stack_trace_tsk(p, &t); + + return t.nr_entries > 2; +} Miroslav