Hi, On Tue, 7 Feb 2023 19:41:02 -0800, Paul E. McKenney wrote: > On Wed, Feb 08, 2023 at 12:07:20AM -0300, Leonardo Brás wrote: >> Hello Paul, >> >> I have been reading the book, until I stumbled on Quick Quiz 3.7, >> Table E.1: Performance of Synchronization Mechanisms >> on 16-CPU 2.8 GHz Intel X5550 (Nehalem) System >> >> <Copying from source, since the PDF is a little tricky> >> >> The first part looks like: >> >> Clock period & 0.4 & 1.0 \\ >> Same-CPU CAS & 12.2 & 33.8 \\ >> Same-CPU lock & 25.6 & 71.2 \\ >> Blind CAS & 12.9 & 35.8 \\ >> CAS & 7.0 & 19.4 \\ >> >> In this case, what would be the last lines "Blind CAS" and "CAS" referring to ? >> >> (For a second I thought it could be "In-Core Blind CAS" and "In-Core CAS" like >> in Table 3.1, but that would not make sense: This "CAS" is faster than the >> previous "Same-CPU CAS". ) > > I was surprised myself, but those measurements are quite real. My best > guess is that the two threads in the core are able to overlap their > accesses, while the single CPU must do everything sequentially. Paul, do you remember how you obtained the data set? There are several data sets under CodeSamples/cpu/data/, but I don't see the one corresponds to the table. The code for collecting these data was added in CodeSamples/cpu/ by commit 81989d7483e2 ("cpu: Reproduce the old cache-to-cache latency measurement code") in 2020. And the next commit 2fc05ca07edc ("api-pthreads.h: Use clock_gettime() and check sched_setaffinity()") improved the stability of reproduced code. This table was first added in commit 38fd945ff401 ("Fill out CPU chapter, including adding Nehalem data.") in 2009. The data have never been updated since. I'm kind of suspecting the "7.0 us" which surprised you at the time might have been an outlier due to some disturbance discussed in Appendix A.3 "What Time Is It?". I'm not sure, just guessing... Thanks, Akira > > Strange, but whatever the reason, true! ;-) > > Thanx, Paul