----- On Jul 2, 2018, at 7:37 PM, Andy Lutomirski luto@xxxxxxxxxxxxxx wrote: >> On Jul 2, 2018, at 4:22 PM, Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> >> wrote: >> >> ----- On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers >> mathieu.desnoyers@xxxxxxxxxxxx wrote: >> >>> ----- On Jul 2, 2018, at 7:06 PM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx >>> wrote: >>> >>>> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers >>>> <mathieu.desnoyers@xxxxxxxxxxxx> wrote: >>>>> >>>>> Unfortunately, that rseq->rseq_cs field needs to be updated by user-space >>>>> with single-copy atomicity. Therefore, we want 32-bit user-space to initialize >>>>> the padding with 0, and only update the low bits with single-copy atomicity. >>>> >>>> Well... It's actually still single-copy atomicity as a 64-bit value. >>>> >>>> Why? Because it doesn't matter how you write the upper bits. You'll be >>>> writing the same value to them (zero) anyway. >>>> >>>> So who cares if the write ends up being two instructions, because the >>>> write to the upper bits doesn't actually *do* anything. >>>> >>>> Hmm? >>> >>> Are there any kind of guarantees that a __u64 update on a 32-bit architecture >>> won't be torn into something daft like byte-per-byte stores when performed >>> from C code ? >>> >>> I don't worry whether the upper bits get updated or how, but I really care >>> about not having store tearing of the low bits update. >> >> For the records, most updates of those low bits are done in assembly >> from critical sections, for which we control exactly how the update is >> performed. >> >> However, there is one helper function in user-space that updates that value >> from C through a volatile store, e.g.: >> >> static inline void rseq_prepare_unload(void) >> { >> __rseq_abi.rseq_cs = 0; >> } > > How about making the field be: > > union { > __u64 rseq_cs; > struct { > __u32 rseq_cs_low; > __u32 rseq_cs_high; > }; > }; > > 32-bit user code that cares about performance can just write to rseq_cs_low > because it already knows that rseq_cs_high == 0. > > The header could even supply a static inline helper write_rseq_cs() that > atomically writes a pointer and just does the right thing for 64-bit, for > 32-bit BE, and for 32-bit LE. > > I think the union really is needed because we can’t rely on user code being > built with -fno-strict-aliasing. Or the helper could use inline asm. > > Anyway, the point is that we get optimal code generation (a single instruction > write of the correct number of bits) without any compat magic in the kernel. That works for me! Any objection from anyone else for this approach ? Thanks, Mathieu > >> >> Thanks, >> >> Mathieu >> >>> >>> Thanks, >>> >>> Mathieu >>> >>> >>> -- >>> Mathieu Desnoyers >>> EfficiOS Inc. >>> http://www.efficios.com >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. > > http://www.efficios.com -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html