Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

"H. Peter Anvin" <hpa@xxxxxxxxx> · Sat, 27 Feb 2016 07:04:18 -0800

On February 27, 2016 6:15:01 AM PST, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>----- On Feb 27, 2016, at 1:24 AM, H. Peter Anvin hpa@xxxxxxxxx wrote:
>
>> On 02/26/16 16:40, Mathieu Desnoyers wrote:
>>>>
>>>> I think it would be a good idea to make this a general pointer for
>the kernel to
>>>> be able to write per thread state to user space, which obviously
>can't be done
>>>> with the vDSO.
>>>>
>>>> This means the libc per thread startup should query the kernel for
>the size of
>>>> this structure and allocate thread local data accordingly.  We can
>then grow
>>>> this structure if needed without making the ABI even more complex.
>>>>
>>>> This is more than a system call: this is an entirely new way for
>userspace to
>>>> interact with the kernel.  Therefore we should make it a general
>facility.
>>>
>>> I'm really glad to see I'm not the only one seeing potential for
>>> genericity here. :-) This is exactly what I had in mind
>>> last year when proposing the thread_local_abi() system call:
>>> a generic way to register an extensible per-thread data structure
>>> so the kernel can communicate with user-space and vice-versa.
>>>
>>> Rather than having the libc query the kernel for size of the
>structure,
>>> I would recommend that libc tells the kernel the size of the
>thread-local
>>> ABI structure it supports. The idea here is that both the kernel and
>libc
>>> need to know about the fields in that structure to allow a two-way
>>> interaction. Fields known only by either the kernel or userspace
>>> are useless for a given thread anyway. This way, libc could
>statically
>>> define the structure.
>> 
>> Big fat NOPE there.  Why?  Because it means that EVERY interaction
>with
>> this memory, no matter how critical, needs to be conditionalized.
>> Furthermore, userspace != libc.  Applications or higher-layer
>libraries
>> might have more information than the running libc about additional
>> fields, but with your proposal libc would gate them.
>
>Good point!
>
>> 
>> As far as the kernel providing the size in the structure (alone) -- I
>> *really* hope you can see what is wrong with that!!  That doesn't
>mean
>> we can't provide it in the structure as well, and that too might
>avoid
>> the skipped libc problem.
>
>Indeed, libc would need to query the size before it can allocate
>the structure.
>
>> 
>>> I would be tempted to also add "features" flags, so both user-space
>>> and the kernel could tell each other what they support: user-space
>>> would announce the set of features it supports, and it could also
>>> query the kernel for the set of supported features. One simple
>approach
>>> would be to use a uint64_t as type for those feature flags, and
>>> reserve the last bit for extending to future flags if we ever have
>>> more than 64.
>>>
>>> Thoughts ?
>> 
>> It doesn't seem like it would hurt, although the size of the flags
>field
>> could end up being an issue.
>
>I'm concerned that this thread-local ABI structure may become messy.
>Let's just imagine how we would first introduce a "cpu_id" field
>(int32_t),
>and eventually add a "seqnum" field for rseq in the future (unsigned
>long).
>
>Both fields need to be read with single-copy semantics as volatile
>reads, and both need to be naturally aligned. However, I'm tempted
>to use the "packed" attribute on the structure since it's an ABI
>between kernel and user-space. A pretty bad example of what this
>could become, due to alignment constraints, looks like:
>
>/* This structure needs to be aligned on pointer size. */
>struct thread_local_abi {
>        int32_t cpu_id;
>        int32_t __unused1;
>        unsigned long seqnum;
>        /* Add new fields at the end. */
>} __attribute__((packed));
>
>And this is just a start. It may become messier as we append
>new fields in the future.
>
>The main argument I currently see in favor of having this
>meta system call for all per-thread features is to only
>maintain a single pointer in the kernel task_struct rather
>than one per thread-local feature.
>
>If the goal is really to keep the burden on the task struct
>small, we could use kmalloc()/kfree() to allocate and free an
>array of pointers to the various per-thread features, rather
>than putting them directly in task_struct. We could keep a
>mask of the enabled features in the task struct too (which
>we will likely have to do even if we go the the thread-local
>ABI meta system call).
>
>Having this per-task allocated pointer array at kernel-level
>would allow us to have one system call per feature, with clear
>semantics, without evolving a messy thread-local ABI structure
>due to all sorts of alignment constraints.
>
>Thoughts ?
>
>Thanks,
>
>Mathieu

I think you are worried about problems which we have already solved many, many times - structures are very common in the user space ABI and we know how to deal with this.

And when you say:

> However, I'm tempted
> to use the "packed" attribute on the structure
> since it's an ABI
> between kernel and user-space.

and mention "unsigned long" in a user space ABI all I can think of that you really have not followed the issues of user space ABI design as they have evolved over the last 20 years.

Simply put: non-problem.  
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html