Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Mon, 2 Jul 2018 16:37:55 -0700

> On Jul 2, 2018, at 4:22 PM, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> 
> ----- On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers mathieu.desnoyers@xxxxxxxxxxxx wrote:
> 
>> ----- On Jul 2, 2018, at 7:06 PM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx
>> wrote:
>> 
>>> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers
>>> <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>>>> 
>>>> Unfortunately, that rseq->rseq_cs field needs to be updated by user-space
>>>> with single-copy atomicity. Therefore, we want 32-bit user-space to initialize
>>>> the padding with 0, and only update the low bits with single-copy atomicity.
>>> 
>>> Well... It's actually still single-copy atomicity as a 64-bit value.
>>> 
>>> Why? Because it doesn't matter how you write the upper bits. You'll be
>>> writing the same value to them (zero) anyway.
>>> 
>>> So who cares if the write ends up being two instructions, because the
>>> write to the upper bits doesn't actually *do* anything.
>>> 
>>> Hmm?
>> 
>> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
>> won't be torn into something daft like byte-per-byte stores when performed
>> from C code ?
>> 
>> I don't worry whether the upper bits get updated or how, but I really care
>> about not having store tearing of the low bits update.
> 
> For the records, most updates of those low bits are done in assembly
> from critical sections, for which we control exactly how the update is
> performed.
> 
> However, there is one helper function in user-space that updates that value
> from C through a volatile store, e.g.:
> 
> static inline void rseq_prepare_unload(void)
> {
>        __rseq_abi.rseq_cs = 0;
> }

How about making the field be:

union {
 __u64 rseq_cs;
 struct {
   __u32 rseq_cs_low;
   __u32 rseq_cs_high;
 };
};

32-bit user code that cares about performance can just write to rseq_cs_low because it already knows that rseq_cs_high == 0.

The header could even supply a static inline helper write_rseq_cs() that atomically writes a pointer and just does the right thing for 64-bit, for 32-bit BE, and for 32-bit LE.

I think the union really is needed because we can’t rely on user code being built with -fno-strict-aliasing.  Or the helper could use inline asm.

Anyway, the point is that we get optimal code generation (a single instruction write of the correct number of bits) without any compat magic in the kernel.

> 
> Thanks,
> 
> Mathieu
> 
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html