On Fri, 27 Oct 2023 12:35:56 -0400 Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote: > > Does that make more sense? > > Not really. > > Please see my other email about the need for a reference count here, for > nested locks use-cases. Note, my original implementation of nested locking was done completely in user space. int __thread lock_cnt; extend() { if (lock_cnt++) return; ... } unextend() { if (--lock_cnt) return; ... } > > By "atomic" operation I suspect you only mean "single instruction" which can > alter the state of the field and keep its prior content in a register, not a > lock-prefixed atomic operation, right ? Correct. Just a per cpu atomic. Hence a "andb" instruction, or the "subl", or whatever. > > The only reason why you have this asm trickiness is because both states > are placed into different bits from the same word, which is just an > optimization. You could achieve the same much more simply by splitting > this state in two different words, e.g.: > > extend() { > WRITE_ONCE(__rseq_abi->cr_nest, __rseq_abi->cr_nest + 1); > barrier() > } > > unextend() { > barrier() > WRITE_ONCE(__rseq_abi->cr_nest, __rseq_abi->cr_nest - 1); > if (READ_ONCE(__rseq_abi->must_yield)) { > WRITE_ONCE(__rseq_abi->must_yield, 0); > sched_yield(); > } > } > > Or am I missing something ? I mentioned about placing this in different bytes, although I meant words, but yeah, if we make them separate it would make it easier. But me being frugal about memory, If this was just two bits (or even a counter with an extra bit) I didn't think about wasting two words for what can be done with one. But this is still an implementation detail, and this code is still very much in flux, and I'm not as worried about those details yet. -- Steve