I am not all that worried about it, mostly curious. get_timestamp() isn't used all that heavily, and it seems plausible that a 40-nanosecond timestamp would work in rcu_ts.[ch]. Fun times, though! Thanx, Paul On Tue, Aug 16, 2022 at 07:43:57PM -0400, Elad Lahav wrote: > I also ran the count_end test on Linux/aarch64 on the same board, and > got the same results as on QNX. No surprise there, but the discrepancy > with the x86_64 results on Linux made me want to double-check that the > QNX build is not doing something stupid. I guess that what it shows is > that, Apple's M1 chip notwithstanding, ARM still hasn't closed the gap > when it comes to raw performance. You can always claim that we are > comparing apples and oranges here, but I noticed the same trend on > many different boards. > > --Elad > > On Sun, 14 Aug 2022 at 23:59, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > On Sun, Aug 14, 2022 at 06:11:44AM -0400, Elad Lahav wrote: > > > On 2022-08-13 19:29, Paul E. McKenney wrote: > > > > On Sat, Aug 13, 2022 at 07:23:40PM -0400, Elad Lahav wrote: > > > > > I believe that performance is statistically the same, but I will > > > > > double check. I assume both GCC and C11 end up using the same > > > > > underlying mechanism for thread-local storage: > > > > > > > > > > https://uclibc.org/docs/tls.pdf > > > > > > > > > > If not implemented, TLS falls back on pthread_[gs]et_specific(), but > > > > > again it would be the same for __thread and _Thread_local. > > > > > > > > Sounds likely to me, but I have been surprised before. > > > > > > Results: > > > > > > Linux/x86-64, i7-8550U, GCC TLS > > > > > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end > > > n_reads: 408000 n_updates: 689366000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 588.235 ns/update: 0.348146 > > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end > > > n_reads: 443000 n_updates: 762876000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 541.761 ns/update: 0.314599 > > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end > > > n_reads: 395000 n_updates: 666718000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 607.595 ns/update: 0.359972 > > > > > > Linux/x86-64, i7-8550U, C11 TLS > > > > > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end > > > n_reads: 410000 n_updates: 684880000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 585.366 ns/update: 0.350426 > > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end > > > n_reads: 411000 n_updates: 698148000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 583.942 ns/update: 0.343767 > > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end > > > n_reads: 409000 n_updates: 704072000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 586.797 ns/update: 0.340874 > > > > > > QNX/aarch64, NXP LX2160A, GCC TLS (emulated) > > > > > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end > > > n_reads: 221000 n_updates: 67876000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 1085.97 ns/update: 3.53586 > > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end > > > n_reads: 227000 n_updates: 67901000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 1057.27 ns/update: 3.53456 > > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end > > > n_reads: 211000 n_updates: 65043000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 1137.44 ns/update: 3.68987 > > > > > > QNX/aarch64, NXP LX2160A, C11 TLS (emulated) > > > > > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end > > > n_reads: 217000 n_updates: 67814000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 1105.99 ns/update: 3.53909 > > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end > > > n_reads: 223000 n_updates: 67860000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 1076.23 ns/update: 3.53669 > > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end > > > n_reads: 218000 n_updates: 68116000 nreaders: 1 nupdaters: 1 duration: > > > 240 > > > ns/read: 1100.92 ns/update: 3.5234 > > > > > > Looking at the disassembly for the QNX binary I realized that it is still > > > using the emulated TLS option (i.e., the compiler generates a shim layer > > > that uses pthread_[gs]et_specific()). I will need to rebuild the compiler > > > with native TLS support and retest, though I doubt it will have a > > > significant impact. > > > > That would explain the lack of statistical significance. If there is > > a change, I would hope that the new style is faster. > > > > > In any case, both the native and the emulated TLS options show the same > > > results with the GCC and C11 versions. > > > > Sounds good, and thank you for checking! > > > > Thanx, Paul