On 1/10/20 7:38 am, Thomas Gleixner wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On Wed, Sep 30 2020 at 20:35, Peter Zijlstra wrote: >> On Wed, Sep 30, 2020 at 08:00:59PM +0200, Thomas Gleixner wrote: >>> On Wed, Sep 30 2020 at 19:03, Peter Zijlstra wrote: >>>> On Wed, Sep 30, 2020 at 05:40:08PM +0200, Thomas Gleixner wrote: >>>> Also, that preempt_disable() in there doesn't actually do anything. >>>> Worse, preempt_disable(); for_each_cpu(); is an anti-pattern. It mixes >>>> static_cpu_has() and boot_cpu_has() in the same bloody condition and has >>>> a pointless ret variable. >> >> Also, I forgot to add, it accesses ->cpus_mask without the proper >> locking, so it could be reading intermediate state from whatever cpumask >> operation that's in progress. > > Yes. I saw that after hitting send. :( > >>> I absolutely agree and I really missed it when looking at it before >>> merging. cpus_read_lock()/unlock() is the right thing to do if at all. >>> >>>> It's shoddy code, that only works if you align the planets right. We >>>> really shouldn't provide interfaces that are this bad. >>>> >>>> It's correct operation is only by accident. >>> >>> True :( >>> >>> I understand Balbirs problem and it makes some sense to provide a >>> solution. We can: >>> >>> 1) reject set_affinity() if the task has that flush muck enabled >>> and user space tries to move it to a SMT enabled core >>> >>> 2) disable the muck if if detects that it is runs on a SMT enabled >>> core suddenly (hotplug says hello) >>> >>> This one is nasty because there is no feedback to user space >>> about the wreckage. >> >> That's and, right, not or. because 1) deals with sched_setffinity() >> and 2) deals wit hotplug. > > It was meant as AND of course. > >> Now 1) requires an arch hook in sched_setaffinity(), something I'm not >> keen on providing, once we provide it, who knows what strange and >> wonderful things archs will dream up. > > I don't think so. We can have that magic in core: > > #ifdef CONFIG_HAS_PARANOID_L1D_FLUSH > static bool paranoid_l1d_valid(struct task_struct *tsk, > const struct cpumask *msk) > { > if (!test_tsk_thread_flag(tsk, TIF_SPEC_L1D_FLUSH)) > return true; > /* Do magic stuff */ > return res; > } > #else > static bool paranoid_l1d_valid(struct task_struct *tsk, > const struct cpumask *msk) > { > return true; > } > #endif > > It's a pretty well defined problem and having the magic in core code > prevents an arch hook which allows abuse of all sorts. > > And the same applies to enable_l1d_flush_for_task(). The only > architecture specific nonsense are the checks whether the CPU bug is > there and whether the hardware supports L1D flushing. > > So we can have: > > #ifdef CONFIG_HAS_PARANOID_L1D_FLUSH > int paranoid_l1d_enable(struct task_struct *tsk) > { > /* Do the SMT validation under the proper locks */ > if (!res) > set_task_thread_flag(tsk, TIF_SPEC_L1D_FLUSH); > return res; > } > #endif > >> And 2) also happens on hot-un-plug, when the task's affinity gets >> forced because it became empty. No user feedback there either, and >> information is lost. > > Of course. It's both that suddenly SMT gets enabled on a core which was > isolated and when the last isolated core in the tasks CPU mask goes > offline. > >> I suppose we can do 2) but send a signal. That would cover all cases and >> keep it in arch code. But yes, that's pretty terrible too. > > Bah. I just looked at the condition to flush: > > if (sched_smt_active() && !this_cpu_read(cpu_info.smt_active) && > (prev_mm & LAST_USER_MM_L1D_FLUSH)) > l1d_flush_hw(); > > That fails to flush when SMT is disabled globally. Balbir? It should have been !sched_smt_active() || (cond) and not sched_smt_active && (cond) I'll fix that up, but your simplification below works as well. > > Of course this should be: > > if (!this_cpu_read(cpu_info.smt_active) && (prev_mm & LAST_USER_MM_L1D_FLUSH)) > l1d_flush_hw(); > > Now we can make this: > > if (unlikely(prev_mm & LAST_USER_MM_L1D_FLUSH)) { > if (!this_cpu_read(cpu_info.smt_active)) > l1d_flush_hw(); > else > task_work_add(...); > > And that task work clears the flag and sends a signal. We're not going > to send a signal from switch_mm() .... > Yes, I see MCE handling uses a similar pattern, so SIGBUS for a task that migrates/moves to a SMT disabled core? Thanks, Balbir