Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86)

Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> · Mon, 13 Jul 2015 15:27:32 +0000 (UTC)

----- On Jul 12, 2015, at 11:38 PM, Andy Lutomirski luto@xxxxxxxxxxxxxx wrote:

> On Jul 12, 2015 12:06 PM, "Mathieu Desnoyers"
> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>>
>> Expose a new system call allowing threads to register a userspace memory
>> area where to store the current CPU number. Scheduler migration sets the
>> TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space,
>> a notify-resume handler updates the current CPU value within that
>> user-space memory area.
>>
>> This getcpu cache is an alternative to the sched_getcpu() vdso which has
>> a few benefits:
>> - It is faster to do a memory read that to call a vDSO,
>> - This cache value can be read from within an inline assembly, which
>>   makes it a useful building block for restartable sequences.
>>
> 
> Let's wait and see what the final percpu atomic solution is.  If it
> involves percpu segments, then this is unnecessary.

percpu segments will likely not solve everything. I have a use-case
with dynamically allocated per-cpu ring buffer in user-space (lttng-ust)
which can be a challenge for percpu segments. Having a fast getcpu()
is a win in those cases.

> 
> Also, this will need to be rebased onto -tip, and that should wait
> until the big exit rewrite is farther along.

I don't really care which thread flag it ends up using, and this is
more or less an internal implementation detail. The important part is
the ABI exposed to user-space, and it's good to start the discussion
on this aspect early.

> 
>> This approach is inspired by Paul Turner and Andrew Hunter's work
>> on percpu atomics, which lets the kernel handle restart of critical
>> sections:
>> Ref.:
>> * https://lkml.org/lkml/2015/6/24/665
>> * https://lwn.net/Articles/650333/
>> *
>> http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf
>>
>> Benchmarking sched_getcpu() vs tls cache approach. Getting the
>> current CPU number:
>>
>> - With Linux vdso:            12.7 ns
> 
> This is a bit unfair, because the glibc wrapper sucks and the
> __vdso_getcpu interface is overcomplicated.  We can fix it with a
> better API.  It won't make it *that* much faster, though.

Even if we improve the vDSO function, we are at a point where just
the function call is not that cheap.

> 
>>
>> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
>> index e504246..157cec0 100644
>> --- a/arch/x86/kernel/signal.c
>> +++ b/arch/x86/kernel/signal.c
>> @@ -750,6 +750,8 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32
>> thread_info_flags)
>>         if (thread_info_flags & _TIF_NOTIFY_RESUME) {
>>                 clear_thread_flag(TIF_NOTIFY_RESUME);
>>                 tracehook_notify_resume(regs);
>> +               if (getcpu_cache_active(current))
>> +                       getcpu_cache_handle_notify_resume(current);
> 
> We need to disentangle this stuff.  This is buried way too deeply here.
> 
> Fortunately, do_notify_resume is going away.  It's already unused on
> 64-bit kernels in -next.

Cool! Of course, I'm willing to rebase this on whichever thread flag
and notification upon resume to userspace makes more sense.

> 
>> +/*
>> + * This resume handler should always be executed between a migration
>> + * triggered by preemption and return to user-space.
>> + */
>> +void getcpu_cache_handle_notify_resume(struct task_struct *t)
>> +{
>> +       int32_t __user *gcp = t->getcpu_cache;
>> +
>> +       if (gcp == NULL)
>> +               return;
>> +       if (unlikely(t->flags & PF_EXITING))
>> +               return;
>> +       /*
>> +        * access_ok() of gcp_user has already been checked by
>> +        * sys_getcpu_cache().
>> +        */
>> +       if (__put_user(raw_smp_processor_id(), gcp))
>> +               force_sig(SIGSEGV, current);
> 
> We're preemptible here, although I think it's okay.  But I'd at least
> clear the getcpu_cache state if __put_user fails, because otherwise
> it's not entirely obvious to me that we can't infinite loop.

Good point. For safety's sake, I'll set t->getcpu_cache to NULL.

> 
>> +/*
>> + * sys_getcpu_cache - setup getcpu cache for caller thread
>> + */
>> +SYSCALL_DEFINE2(getcpu_cache, int32_t __user *, gcp, int, flags)
>> +{
>> +       if (flags)
>> +               return -EINVAL;
>> +       if (gcp != NULL && !access_ok(VERIFY_WRITE, gcp, sizeof(int32_t)))
>> +               return -EFAULT;
>> +       current->getcpu_cache = gcp;
>> +       /* Will update *gcp on resume */
>> +       if (gcp)
>> +               set_thread_flag(TIF_NOTIFY_RESUME);
>> +       return 0;
>> +}
> 
> IMO this is impolite.  If the pointer is bad, we should return -EFAULT
> rather than sending SIGSEGV.

OK, so I guess you mean we should do the __put_user() in getcpu_cache
too, rather than relying on the one in notify_resume, so we can handle
faults there and return -EFAULT rather than sending SIGSEGV. Yep, it
makes sense, will fix.

Thanks!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html