----- On Dec 2, 2020, at 10:35 AM, Andy Lutomirski luto@xxxxxxxxxx wrote: > membarrier() does not explicitly sync_core() remote CPUs; instead, it > relies on the assumption that an IPI will result in a core sync. On > x86, I think this may be true in practice, but it's not architecturally > reliable. In particular, the SDM and APM do not appear to guarantee > that interrupt delivery is serializing. While IRET does serialize, IPI > return can schedule, thereby switching to another task in the same mm > that was sleeping in a syscall. The new task could then SYSRET back to > usermode without ever executing IRET. > > Make this more robust by explicitly calling sync_core_before_usermode() > on remote cores. (This also helps people who search the kernel tree for > instances of sync_core() and sync_core_before_usermode() -- one might be > surprised that the core membarrier code doesn't currently show up in a > such a search.) > > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> > --- > kernel/sched/membarrier.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c > index 6251d3d12abe..01538b31f27e 100644 > --- a/kernel/sched/membarrier.c > +++ b/kernel/sched/membarrier.c > @@ -166,6 +166,23 @@ static void ipi_mb(void *info) > smp_mb(); /* IPIs should be serializing but paranoid. */ > } > > +static void ipi_sync_core(void *info) > +{ > + /* > + * The smp_mb() in membarrier after all the IPIs is supposed to > + * ensure that memory on remote CPUs that occur before the IPI > + * become visible to membarrier()'s caller -- see scenario B in > + * the big comment at the top of this file. > + * > + * A sync_core() would provide this guarantee, but > + * sync_core_before_usermode() might end up being deferred until > + * after membarrier()'s smp_mb(). > + */ > + smp_mb(); /* IPIs should be serializing but paranoid. */ > + > + sync_core_before_usermode(); > +} > + > static void ipi_rseq(void *info) > { > /* > @@ -301,6 +318,7 @@ static int membarrier_private_expedited(int flags, int > cpu_id) > if (!(atomic_read(&mm->membarrier_state) & > MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY)) > return -EPERM; > + ipi_func = ipi_sync_core; > } else if (flags == MEMBARRIER_FLAG_RSEQ) { > if (!IS_ENABLED(CONFIG_RSEQ)) > return -EINVAL; > -- > 2.28.0 -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com