Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Sun, 12 Jul 2015 20:38:33 -0700

On Jul 12, 2015 12:06 PM, "Mathieu Desnoyers"
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> Expose a new system call allowing threads to register a userspace memory
> area where to store the current CPU number. Scheduler migration sets the
> TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space,
> a notify-resume handler updates the current CPU value within that
> user-space memory area.
>
> This getcpu cache is an alternative to the sched_getcpu() vdso which has
> a few benefits:
> - It is faster to do a memory read that to call a vDSO,
> - This cache value can be read from within an inline assembly, which
>   makes it a useful building block for restartable sequences.
>

Let's wait and see what the final percpu atomic solution is.  If it
involves percpu segments, then this is unnecessary.

Also, this will need to be rebased onto -tip, and that should wait
until the big exit rewrite is farther along.

> This approach is inspired by Paul Turner and Andrew Hunter's work
> on percpu atomics, which lets the kernel handle restart of critical
> sections:
> Ref.:
> * https://lkml.org/lkml/2015/6/24/665
> * https://lwn.net/Articles/650333/
> * http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf
>
> Benchmarking sched_getcpu() vs tls cache approach. Getting the
> current CPU number:
>
> - With Linux vdso:            12.7 ns

This is a bit unfair, because the glibc wrapper sucks and the
__vdso_getcpu interface is overcomplicated.  We can fix it with a
better API.  It won't make it *that* much faster, though.

>
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index e504246..157cec0 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -750,6 +750,8 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
>         if (thread_info_flags & _TIF_NOTIFY_RESUME) {
>                 clear_thread_flag(TIF_NOTIFY_RESUME);
>                 tracehook_notify_resume(regs);
> +               if (getcpu_cache_active(current))
> +                       getcpu_cache_handle_notify_resume(current);

We need to disentangle this stuff.  This is buried way too deeply here.

Fortunately, do_notify_resume is going away.  It's already unused on
64-bit kernels in -next.

> +/*
> + * This resume handler should always be executed between a migration
> + * triggered by preemption and return to user-space.
> + */
> +void getcpu_cache_handle_notify_resume(struct task_struct *t)
> +{
> +       int32_t __user *gcp = t->getcpu_cache;
> +
> +       if (gcp == NULL)
> +               return;
> +       if (unlikely(t->flags & PF_EXITING))
> +               return;
> +       /*
> +        * access_ok() of gcp_user has already been checked by
> +        * sys_getcpu_cache().
> +        */
> +       if (__put_user(raw_smp_processor_id(), gcp))
> +               force_sig(SIGSEGV, current);

We're preemptible here, although I think it's okay.  But I'd at least
clear the getcpu_cache state if __put_user fails, because otherwise
it's not entirely obvious to me that we can't infinite loop.

> +/*
> + * sys_getcpu_cache - setup getcpu cache for caller thread
> + */
> +SYSCALL_DEFINE2(getcpu_cache, int32_t __user *, gcp, int, flags)
> +{
> +       if (flags)
> +               return -EINVAL;
> +       if (gcp != NULL && !access_ok(VERIFY_WRITE, gcp, sizeof(int32_t)))
> +               return -EFAULT;
> +       current->getcpu_cache = gcp;
> +       /* Will update *gcp on resume */
> +       if (gcp)
> +               set_thread_flag(TIF_NOTIFY_RESUME);
> +       return 0;
> +}

IMO this is impolite.  If the pointer is bad, we should return -EFAULT
rather than sending SIGSEGV.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html