I believe that performance is statistically the same, but I will double check. I assume both GCC and C11 end up using the same underlying mechanism for thread-local storage: https://uclibc.org/docs/tls.pdf If not implemented, TLS falls back on pthread_[gs]et_specific(), but again it would be the same for __thread and _Thread_local. I was about to run all benchmarks on a cute 16-core aarch64 machine I recently bought for testing the scalability of my latest kernel work, but the code requires linking against urcu, and now I'm going down the rabbit hole of building that for QNX. Does the counter code really need this library? --Elad On Sat, 13 Aug 2022 at 19:12, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Sat, Aug 13, 2022 at 07:49:27AM -0400, Elad Lahav wrote: > > On 2022-08-13 07:45, Elad Lahav wrote: > > > Signed-off-by: Elad Lahav <e2lahav@xxxxxxxxx> > > > --- > > > > This one will probably require some back-and-forth. If we change it here, do > > other places need to be updated? Have I now removed the explanation of GCC's > > __thread storage class, which will impact the readability of other sections? > > Or are you ready to take the plunge and convert all code snippets? > > > > For the record, I did build and test count_end.c. > > As long as you continue building and testing, checking the descriptions in > text of the code snippets (in case there is a mention of __thread there), > and verifying that the PDF still builds, as fast as you are willing to go! > > Are you seeing the same performance with the new as with the old on > your hardware? (No statistically significant difference on my laptop, > but figured I should ask.) > > Thanx, Paul