Re: [PATCH] count: Switch from GCC to C11 thread-local storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I also ran the count_end test on Linux/aarch64 on the same board, and
got the same results as on QNX. No surprise there, but the discrepancy
with the x86_64 results on Linux made me want to double-check that the
QNX build is not doing something stupid. I guess that what it shows is
that, Apple's M1 chip notwithstanding, ARM still hasn't closed the gap
when it comes to raw performance. You can always claim that we are
comparing apples and oranges here, but I noticed the same trend on
many different boards.

--Elad

On Sun, 14 Aug 2022 at 23:59, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Sun, Aug 14, 2022 at 06:11:44AM -0400, Elad Lahav wrote:
> > On 2022-08-13 19:29, Paul E. McKenney wrote:
> > > On Sat, Aug 13, 2022 at 07:23:40PM -0400, Elad Lahav wrote:
> > > > I believe that performance is statistically the same, but I will
> > > > double check. I assume both GCC and C11 end up using the same
> > > > underlying mechanism for thread-local storage:
> > > >
> > > > https://uclibc.org/docs/tls.pdf
> > > >
> > > > If not implemented, TLS falls back on pthread_[gs]et_specific(), but
> > > > again it would be the same for __thread and _Thread_local.
> > >
> > > Sounds likely to me, but I have been surprised before.
> >
> > Results:
> >
> > Linux/x86-64, i7-8550U, GCC TLS
> >
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 408000  n_updates: 689366000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 588.235  ns/update: 0.348146
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 443000  n_updates: 762876000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 541.761  ns/update: 0.314599
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 395000  n_updates: 666718000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 607.595  ns/update: 0.359972
> >
> > Linux/x86-64, i7-8550U, C11 TLS
> >
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 410000  n_updates: 684880000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 585.366  ns/update: 0.350426
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 411000  n_updates: 698148000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 583.942  ns/update: 0.343767
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 409000  n_updates: 704072000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 586.797  ns/update: 0.340874
> >
> > QNX/aarch64, NXP LX2160A, GCC TLS (emulated)
> >
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 221000  n_updates: 67876000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1085.97  ns/update: 3.53586
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 227000  n_updates: 67901000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1057.27  ns/update: 3.53456
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 211000  n_updates: 65043000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1137.44  ns/update: 3.68987
> >
> > QNX/aarch64, NXP LX2160A, C11 TLS (emulated)
> >
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 217000  n_updates: 67814000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1105.99  ns/update: 3.53909
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 223000  n_updates: 67860000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1076.23  ns/update: 3.53669
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 218000  n_updates: 68116000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1100.92  ns/update: 3.5234
> >
> > Looking at the disassembly for the QNX binary I realized that it is still
> > using the emulated TLS option (i.e., the compiler generates a shim layer
> > that uses pthread_[gs]et_specific()). I will need to rebuild the compiler
> > with native TLS support and retest, though I doubt it will have a
> > significant impact.
>
> That would explain the lack of statistical significance.  If there is
> a change, I would hope that the new style is faster.
>
> > In any case, both the native and the emulated TLS options show the same
> > results with the GCC and C11 versions.
>
> Sounds good, and thank you for checking!
>
>                                                         Thanx, Paul



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux