On Wed, Dec 28, 2022 at 08:26:23AM +0900, Akira Yokosawa wrote: > Hi, > > On Date: Tue, 27 Dec 2022 10:29:20 -0800, Paul E. McKenney wrote: > > On Tue, Dec 27, 2022 at 08:06:19AM -0800, SeongJae Park wrote: > >> Add missing unbreakable spaces for 'CPUs' and 'elements'. > >> > >> Signed-off-by: SeongJae Park <sj38.park@xxxxxxxxx> > > > > Works for me, thank you! > > > > I have queued this, and if Akira (who tests with a much wider variety > > of environments than I do) does not object, then I will push it out. > > > > Thanx, Paul > > > >> --- > >> Changes from v1 > >> - Fix build error by removing unbreakable space from \cref{} > > Reviewed-by: Akira Yokosawa <akiyks@xxxxxxxxx> And pushed, thank you both! Thanx, Paul > Thanks, Akira > >> > >> datastruct/datastruct.tex | 23 +++++++++++------------ > >> 1 file changed, 11 insertions(+), 12 deletions(-) > >> > >> diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex > >> index 99c92d9a..c095b846 100644 > >> --- a/datastruct/datastruct.tex > >> +++ b/datastruct/datastruct.tex > >> @@ -664,7 +664,7 @@ shows the same data on a linear scale. > >> This drops the global-locking trace into the x-axis, but allows the > >> non-ideal performance of RCU and hazard pointers to be more readily > >> discerned. > >> -Both show a change in slope at 224 CPUs, and this is due to hardware > >> +Both show a change in slope at 224~CPUs, and this is due to hardware > >> multithreading. > >> At 32 and fewer CPUs, each thread has a core to itself. > >> In this regime, RCU does better than does hazard pointers because the > >> @@ -672,11 +672,11 @@ latter's read-side \IXpl{memory barrier} result in dead time within the core. > >> In short, RCU is better able to utilize a core from a single hardware > >> thread than is hazard pointers. > >> > >> -This situation changes above 224 CPUs. > >> +This situation changes above 224~CPUs. > >> Because RCU is using more than half of each core's resources from a > >> single hardware thread, RCU gains relatively little benefit from the > >> second hardware thread in each core. > >> -The slope of the hazard-pointers trace also decreases at 224 CPUs, but > >> +The slope of the hazard-pointers trace also decreases at 224~CPUs, but > >> less dramatically, > >> because the second hardware thread is able to fill in the time > >> that the first hardware thread is stalled due to \IXh{memory-barrier}{latency}. > >> @@ -776,7 +776,7 @@ to about half again faster than that of either QSBR or RCU\@. > >> Still unconvinced? > >> Then look at the log-log plot in > >> \cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size}, > >> - which shows performance for 448 CPUs as a function of the > >> + which shows performance for 448~CPUs as a function of the > >> hash-table size, that is, number of buckets and maximum number > >> of elements. > >> A hash-table of size 1,024 has 1,024~buckets and contains > >> @@ -785,14 +785,13 @@ to about half again faster than that of either QSBR or RCU\@. > >> Because this is a read-only benchmark, the actual occupancy is > >> always equal to the average occupancy. > >> > >> - This figure shows near-ideal performance below about 8,000 > >> - elements, that is, when the hash table comprises less than > >> - 1\,MB of data. > >> + This figure shows near-ideal performance below about 8,000~elements, > >> + that is, when the hash table comprises less than 1\,MB of data. > >> This near-ideal performance is consistent with that for the > >> pre-BSD routing table shown in > >> \cref{fig:defer:Pre-BSD Routing Table Protected by RCU} > >> on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU}, > >> - even at 448 CPUs. > >> + even at 448~CPUs. > >> However, the performance drops significantly (this is a log-log > >> plot) at about 8,000~elements, which is where the 1,048,576-byte > >> L2 cache overflows. > >> @@ -835,7 +834,7 @@ data structure represented by the pre-BSD routing table. > >> > >> \QuickQuiz{ > >> The memory system is a serious bottleneck on this big system. > >> - Why bother putting 448 CPUs on a system without giving them > >> + Why bother putting 448~CPUs on a system without giving them > >> enough memory bandwidth to do something useful??? > >> }\QuickQuizAnswer{ > >> It would indeed be a bad idea to use this large and expensive > >> @@ -905,10 +904,10 @@ concurrency control to begin with. > >> \Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates} > >> therefore shows the effect of updates on readers. > >> At the extreme left-hand side of this graph, all but one of the CPUs > >> -are doing lookups, while to the right all 448 CPUs are doing updates. > >> +are doing lookups, while to the right all 448~CPUs are doing updates. > >> For all four implementations, the number of lookups per millisecond > >> decreases as the number of updating CPUs increases, of course reaching > >> -zero lookups per millisecond when all 448 CPUs are updating. > >> +zero lookups per millisecond when all 448~CPUs are updating. > >> Both hazard pointers and RCU do well compared to per-bucket locking > >> because their readers do not increase update-side lock contention. > >> RCU does well relative to hazard pointers as the number of updaters > >> @@ -931,7 +930,7 @@ showed the effect of increasing update rates on lookups, > >> \cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo} > >> shows the effect of increasing update rates on the updates themselves. > >> Again, at the left-hand side of the figure all but one of the CPUs are > >> -doing lookups and at the right-hand side of the figure all 448 CPUs are > >> +doing lookups and at the right-hand side of the figure all 448~CPUs are > >> doing updates. > >> Hazard pointers and RCU start off with a significant advantage because, > >> unlike bucket locking, readers do not exclude updaters. > >> -- > >> 2.17.1 > >>