----- On Jul 2, 2018, at 9:19 PM, Mathieu Desnoyers mathieu.desnoyers@xxxxxxxxxxxx wrote: > ----- On Jul 2, 2018, at 7:37 PM, Andy Lutomirski luto@xxxxxxxxxxxxxx wrote: > >>> On Jul 2, 2018, at 4:22 PM, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> >>> wrote: >>> >>> ----- On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers >>> mathieu.desnoyers@xxxxxxxxxxxx wrote: >>> >>>> ----- On Jul 2, 2018, at 7:06 PM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx >>>> wrote: >>>> >>>>> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers >>>>> <mathieu.desnoyers@xxxxxxxxxxxx> wrote: >>>>>> >>>>>> Unfortunately, that rseq->rseq_cs field needs to be updated by user-space >>>>>> with single-copy atomicity. Therefore, we want 32-bit user-space to initialize >>>>>> the padding with 0, and only update the low bits with single-copy atomicity. >>>>> >>>>> Well... It's actually still single-copy atomicity as a 64-bit value. >>>>> >>>>> Why? Because it doesn't matter how you write the upper bits. You'll be >>>>> writing the same value to them (zero) anyway. >>>>> >>>>> So who cares if the write ends up being two instructions, because the >>>>> write to the upper bits doesn't actually *do* anything. >>>>> >>>>> Hmm? >>>> >>>> Are there any kind of guarantees that a __u64 update on a 32-bit architecture >>>> won't be torn into something daft like byte-per-byte stores when performed >>>> from C code ? >>>> >>>> I don't worry whether the upper bits get updated or how, but I really care >>>> about not having store tearing of the low bits update. >>> >>> For the records, most updates of those low bits are done in assembly >>> from critical sections, for which we control exactly how the update is >>> performed. >>> >>> However, there is one helper function in user-space that updates that value >>> from C through a volatile store, e.g.: >>> >>> static inline void rseq_prepare_unload(void) >>> { >>> __rseq_abi.rseq_cs = 0; >>> } >> >> How about making the field be: >> >> union { >> __u64 rseq_cs; >> struct { >> __u32 rseq_cs_low; >> __u32 rseq_cs_high; >> }; >> }; >> >> 32-bit user code that cares about performance can just write to rseq_cs_low >> because it already knows that rseq_cs_high == 0. >> >> The header could even supply a static inline helper write_rseq_cs() that >> atomically writes a pointer and just does the right thing for 64-bit, for >> 32-bit BE, and for 32-bit LE. >> >> I think the union really is needed because we can’t rely on user code being >> built with -fno-strict-aliasing. Or the helper could use inline asm. >> >> Anyway, the point is that we get optimal code generation (a single instruction >> write of the correct number of bits) without any compat magic in the kernel. > > That works for me! Any objection from anyone else for this approach ? One thing to consider is how we will implement the load of that pointer on the kernel side. Strictly-speaking, the rseq uapi talks about single-copy atomicity, and does not specify _which_ thread is expected to update that pointer. So arguably, the common case is that the current thread is updating it, which would allow the kernel to read it piece-wise. However, nothing prevents user-space from updating it from another thread with single-copy atomicity. So in order to be on the safe side, I prefer to guarantee single-copy atomicity of the get_user() load from the kernel that reads this pointer. This means a 32-bit kernel would have to perform two independent loads: one for low bits, one for high bits. So it does look like we need some __LP64__ ifdefery even with the union trick. Therefore, I'm not convinced the union is useful at all. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html