[PATCH -perfbook 4/8] datastruct: Employ \cref{} and its variants

Akira Yokosawa <akiyks@xxxxxxxxx> · Tue, 18 May 2021 21:21:14 +0900

Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx>
---
 datastruct/datastruct.tex | 348 +++++++++++++++++++-------------------
 1 file changed, 174 insertions(+), 174 deletions(-)

diff --git a/datastruct/datastruct.tex b/datastruct/datastruct.tex
index 038f3923..a396b688 100644
--- a/datastruct/datastruct.tex
+++ b/datastruct/datastruct.tex
@@ -85,18 +85,18 @@ This section focuses on a single data structure, namely the hash table.
 This focused approach allows a much deeper investigation of how concurrency
 interacts with data structures, and also focuses on a data structure
 that is heavily used in practice.
-Section~\ref{sec:datastruct:Hash-Table Design}
+\Cref{sec:datastruct:Hash-Table Design}
 overviews the design, and
-Section~\ref{sec:datastruct:Hash-Table Implementation}
+\cref{sec:datastruct:Hash-Table Implementation}
 presents the implementation.
 Finally,
-Section~\ref{sec:datastruct:Hash-Table Performance}
+\cref{sec:datastruct:Hash-Table Performance}
 discusses the resulting performance and scalability.
 
 \subsection{Hash-Table Design}
 \label{sec:datastruct:Hash-Table Design}
 
-Chapter~\ref{cha:Partitioning and Synchronization Design}
+\Cref{cha:Partitioning and Synchronization Design}
 emphasized the need to apply partitioning in order to attain
 respectable performance and scalability, so partitionability
 must be a first-class criterion when selecting data structures.
@@ -134,17 +134,17 @@ excellent scalability.
 \label{sec:datastruct:Hash-Table Implementation}
 
 \begin{fcvref}[ln:datastruct:hash_bkt:struct]
-Listing~\ref{lst:datastruct:Hash-Table Data Structures}
+\Cref{lst:datastruct:Hash-Table Data Structures}
 (\path{hash_bkt.c})
 shows a set of data structures used in a simple fixed-sized hash
 table using chaining and per-hash-bucket locking, and
-Figure~\ref{fig:datastruct:Hash-Table Data-Structure Diagram}
+\cref{fig:datastruct:Hash-Table Data-Structure Diagram}
 diagrams how they fit together.
 The \co{hashtab} structure (\clnrefrange{tab:b}{tab:e} in
-Listing~\ref{lst:datastruct:Hash-Table Data Structures})
+\cref{lst:datastruct:Hash-Table Data Structures})
 contains four \co{ht_bucket} structures
 (\clnrefrange{bucket:b}{bucket:e} in
-Listing~\ref{lst:datastruct:Hash-Table Data Structures}),
+\cref{lst:datastruct:Hash-Table Data Structures}),
 with the \co{->ht_nbuckets} field controlling the number of buckets
 and the \co{->ht_cmp} field holding the pointer to key-comparison
 function.
@@ -152,7 +152,7 @@ Each such bucket contains a list header \co{->htb_head} and
 a lock \co{->htb_lock}.
 The list headers chain \co{ht_elem} structures
 (\clnrefrange{elem:b}{elem:e} in
-Listing~\ref{lst:datastruct:Hash-Table Data Structures})
+\cref{lst:datastruct:Hash-Table Data Structures})
 through their
 \co{->hte_next} fields, and each \co{ht_elem} structure also caches
 the corresponding element's hash value in the \co{->hte_hash} field.
@@ -173,13 +173,13 @@ which might contain a complex key.
 \label{fig:datastruct:Hash-Table Data-Structure Diagram}
 \end{figure}
 
-Figure~\ref{fig:datastruct:Hash-Table Data-Structure Diagram}
+\Cref{fig:datastruct:Hash-Table Data-Structure Diagram}
 shows bucket~0 containing two elements and bucket~2 containing one.
 
 \begin{fcvref}[ln:datastruct:hash_bkt:map_lock:map]
-Listing~\ref{lst:datastruct:Hash-Table Mapping and Locking}
+\Cref{lst:datastruct:Hash-Table Mapping and Locking}
 shows mapping and locking functions.
-Lines~\lnref{b} and~\lnref{e}
+\Clnref{b,e}
 show the macro \co{HASH2BKT()}, which maps from a hash value
 to the corresponding \co{ht_bucket} structure.
 This macro uses a simple modulus: if more aggressive hashing is required,
@@ -195,24 +195,24 @@ corresponding to the specified hash value.
 \end{listing}
 
 \begin{fcvref}[ln:datastruct:hash_bkt:lookup]
-Listing~\ref{lst:datastruct:Hash-Table Lookup}
+\Cref{lst:datastruct:Hash-Table Lookup}
 shows \co{hashtab_lookup()},
 which returns a pointer to the element with the specified hash and key if it
 exists, or \co{NULL} otherwise.
 This function takes both a hash value and a pointer to the key because
 this allows users of this function to use arbitrary keys and
 arbitrary hash functions.
-Line~\lnref{map} maps from the hash value to a pointer to the corresponding
+\Clnref{map} maps from the hash value to a pointer to the corresponding
 hash bucket.
 Each pass through the loop spanning
 \clnrefrange{loop:b}{loop:e} examines one element
 of the bucket's hash chain.
-Line~\lnref{hashmatch} checks to see if the hash values match, and if not,
-line~\lnref{next}
+\Clnref{hashmatch} checks to see if the hash values match, and if not,
+\clnref{next}
 proceeds to the next element.
-Line~\lnref{keymatch} checks to see if the actual key matches, and if so,
-line~\lnref{return} returns a pointer to the matching element.
-If no element matches, line~\lnref{ret_NULL} returns \co{NULL}.
+\Clnref{keymatch} checks to see if the actual key matches, and if so,
+\clnref{return} returns a pointer to the matching element.
+If no element matches, \clnref{ret_NULL} returns \co{NULL}.
 \end{fcvref}
 
 \begin{listing}
@@ -225,7 +225,7 @@ If no element matches, line~\lnref{ret_NULL} returns \co{NULL}.
 	\begin{fcvref}[ln:datastruct:hash_bkt:lookup]
 	But isn't the double comparison on
 	\clnrefrange{hashmatch}{return} in
-	Listing~\ref{lst:datastruct:Hash-Table Lookup} inefficient
+	\cref{lst:datastruct:Hash-Table Lookup} inefficient
 	in the case where the key fits into an unsigned long?
 	\end{fcvref}
 }\QuickQuizAnswer{
@@ -244,14 +244,14 @@ If no element matches, line~\lnref{ret_NULL} returns \co{NULL}.
 \label{lst:datastruct:Hash-Table Modification}
 \end{listing}
 
-Listing~\ref{lst:datastruct:Hash-Table Modification}
+\Cref{lst:datastruct:Hash-Table Modification}
 shows the \co{hashtab_add()} and \co{hashtab_del()} functions
 that add and delete elements from the hash table, respectively.
 
 \begin{fcvref}[ln:datastruct:hash_bkt:add_del:add]
 The \co{hashtab_add()} function simply sets the element's hash
-value on line~\lnref{set}, then adds it to the corresponding bucket on
-lines~\lnref{add:b} and~\lnref{add:e}.
+value on \clnref{set}, then adds it to the corresponding bucket on
+\clnref{add:b,add:e}.
 \end{fcvref}
 The \co{hashtab_del()} function simply removes the specified element
 from whatever hash chain it is on, courtesy of the doubly linked
@@ -267,22 +267,22 @@ or modifying this same bucket, for example, by invoking
 \label{lst:datastruct:Hash-Table Allocation and Free}
 \end{listing}
 
-Listing~\ref{lst:datastruct:Hash-Table Allocation and Free}
+\Cref{lst:datastruct:Hash-Table Allocation and Free}
 shows \co{hashtab_alloc()} and \co{hashtab_free()},
 which do hash-table allocation and freeing, respectively.
 \begin{fcvref}[ln:datastruct:hash_bkt:alloc_free:alloc]
 Allocation begins on
 \clnrefrange{alloc:b}{alloc:e} with allocation of the underlying memory.
-If line~\lnref{chk_NULL} detects that memory has been exhausted,
-line~\lnref{ret_NULL} returns
+If \clnref{chk_NULL} detects that memory has been exhausted,
+\clnref{ret_NULL} returns
 \co{NULL} to the caller.
-Otherwise, lines~\lnref{set_nbck} and~\lnref{set_cmp} initialize
+Otherwise, \clnref{set_nbck,set_cmp} initialize
 the number of buckets and the pointer to key-comparison function,
 and the loop
 spanning \clnrefrange{loop:b}{loop:e} initializes the buckets themselves,
 including the chain list header on
-line~\lnref{init_head} and the lock on line~\lnref{init_lock}.
-Finally, line~\lnref{return} returns a pointer to the newly allocated hash table.
+\clnref{init_head} and the lock on \clnref{init_lock}.
+Finally, \clnref{return} returns a pointer to the newly allocated hash table.
 \end{fcvref}
 \begin{fcvref}[ln:datastruct:hash_bkt:alloc_free:free]
 The \co{hashtab_free()} function on
@@ -302,7 +302,7 @@ The \co{hashtab_free()} function on
 The performance results for a single 28-core socket of a 2.1\,GHz
 Intel Xeon system using a bucket-locked hash table
 with 262,144 buckets are shown in
-Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo}.
+\cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo}.
 The performance does scale nearly linearly, but it falls a far short
 of the ideal performance level, even at only 28~CPUs.
 Part of this shortfall is due to the fact that the lock acquisitions and
@@ -317,7 +317,7 @@ on two or more CPUs.
 \end{figure}
 
 And things only get worse with more CPUs, as can be seen in
-Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; 448 CPUs}.
+\cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; 448 CPUs}.
 We do not need to show ideal performance: The performance for 29~CPUs
 and beyond is all too clearly worse than abysmal.
 This clearly underscores the dangers of extrapolating performance from a
@@ -349,7 +349,7 @@ We can test this by increasing the number of hash buckets.
 \end{figure}
 
 However, as can be seen in
-Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets},
+\cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets},
 changing the number of buckets has almost no effect:
 Scalability is still abysmal.
 In particular, we still see a sharp dropoff at 29~CPUs and beyond.
@@ -357,13 +357,13 @@ Clearly something else is going on.
 
 The problem is that this is a multi-socket system, with CPUs~0--27
 and~225--251 mapped to the first socket as shown in
-Figure~\ref{fig:datastruct:NUMA Topology of System Under Test}.
+\cref{fig:datastruct:NUMA Topology of System Under Test}.
 Test runs confined to the first 28~CPUs therefore perform quite
 well, but tests that involve socket~0's CPUs~0--27 as well as
 socket~1's CPU~28 incur the overhead of passing data across
 socket boundaries.
 This can severely degrade performance, as was discussed in
-Section~\ref{sec:cpu:Hardware System Architecture}.
+\cref{sec:cpu:Hardware System Architecture}.
 In short, large multi-socket systems require good locality of reference
 in addition to full partitioning.
 The remainder of this chapter will discuss ways of providing good
@@ -458,7 +458,7 @@ the need for read-side synchronization can degrade performance in
 read-mostly situations.
 However, we can achieve both performance and scalability by using
 RCU, which was introduced in
-Section~\ref{sec:defer:Read-Copy Update (RCU)}.
+\cref{sec:defer:Read-Copy Update (RCU)}.
 Similar results can be achieved using hazard pointers
 (\path{hazptr.c})~\cite{MagedMichael04a}, which will be included in
 the performance results shown in this
@@ -469,17 +469,17 @@ section~\cite{McKenney:2013:SDS:2483852.2483867}.
 
 For an RCU-protected hash table with per-bucket locking,
 updaters use locking as shown in
-Section~\ref{sec:datastruct:Partitionable Data Structures},
+\cref{sec:datastruct:Partitionable Data Structures},
 but readers use RCU\@.
 The data structures remain as shown in
-Listing~\ref{lst:datastruct:Hash-Table Data Structures},
+\cref{lst:datastruct:Hash-Table Data Structures},
 and the \co{HASH2BKT()}, \co{hashtab_lock()}, and \co{hashtab_unlock()}
 functions remain as shown in
-Listing~\ref{lst:datastruct:Hash-Table Mapping and Locking}.
+\cref{lst:datastruct:Hash-Table Mapping and Locking}.
 However, readers use the lighter-weight concurrency-control embodied
 by \co{hashtab_lock_lookup()} and \co{hashtab_unlock_lookup()}
 shown in
-Listing~\ref{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Control}.
+\cref{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Control}.
 
 \begin{listing}
 \input{CodeSamples/datastruct/hash/hash_bkt_rcu@lock_unlock.fcv}
@@ -487,11 +487,11 @@ Listing~\ref{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Contr
 \label{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Control}
 \end{listing}
 
-Listing~\ref{lst:datastruct:RCU-Protected Hash-Table Lookup}
+\Cref{lst:datastruct:RCU-Protected Hash-Table Lookup}
 shows \co{hashtab_lookup()} for the RCU-protected per-bucket-locked
 hash table.
 This is identical to that in
-Listing~\ref{lst:datastruct:Hash-Table Lookup}
+\cref{lst:datastruct:Hash-Table Lookup}
 except that \co{cds_list_for_each_entry()} is replaced
 by \co{cds_list_for_each_entry_rcu()}.
 Both of these primitives traverse the hash chain referenced
@@ -539,11 +539,11 @@ RCU read-side critical section, for example, the caller must invoke
 \label{lst:datastruct:RCU-Protected Hash-Table Modification}
 \end{listing}
 
-Listing~\ref{lst:datastruct:RCU-Protected Hash-Table Modification}
+\Cref{lst:datastruct:RCU-Protected Hash-Table Modification}
 shows \co{hashtab_add()} and \co{hashtab_del()}, both of which
 are quite similar to their counterparts in the non-RCU hash table
 shown in
-Listing~\ref{lst:datastruct:Hash-Table Modification}.
+\cref{lst:datastruct:Hash-Table Modification}.
 The \co{hashtab_add()} function uses \co{cds_list_add_rcu()} instead
 of \co{cds_list_add()} in order to ensure proper ordering when
 an element is added to the hash table at the same time that it is
@@ -569,7 +569,7 @@ freeing or otherwise reusing the memory for the newly deleted element.
 \label{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
 \end{figure}
 
-Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
+\Cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
 shows the read-only performance of RCU-protected and hazard-pointer-protected
 hash tables against the previous section's per-bucket-locked implementation.
 As you can see, both RCU and hazard pointers perform and scale
@@ -587,7 +587,7 @@ RCU does slightly better than hazard pointers.
 \label{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo; Linear Scale}
 \end{figure}
 
-Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo; Linear Scale}
+\Cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo; Linear Scale}
 shows the same data on a linear scale.
 This drops the global-locking trace into the x-axis, but allows the
 non-ideal performance of RCU and hazard pointers to be more readily
@@ -621,7 +621,7 @@ advantage depends on the workload.
 But why is RCU's performance a factor of five less than ideal?
 One possibility is that the per-thread counters manipulated by
 \co{rcu_read_lock()} and \co{rcu_read_unlock()} are slowing things down.
-Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR; Linear Scale}
+\Cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR; Linear Scale}
 therefore adds the results for the QSBR variant of RCU, whose read-side
 primitives do nothing.
 And although QSBR does perform slightly better than does RCU, it is still
@@ -634,7 +634,7 @@ about a factor of five short of ideal.
 \label{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR and Unsynchronized; Linear Scale}
 \end{figure}
 
-Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR and Unsynchronized; Linear Scale}
+\Cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR and Unsynchronized; Linear Scale}
 adds completely unsynchronized results, which works because this
 is a read-only benchmark with nothing to synchronize.
 Even with no synchronization whatsoever, performance still falls far
@@ -642,12 +642,12 @@ short of ideal.
 
 The problem is that this system has sockets with 28 cores, which have
 the modest cache sizes shown in
-Figure~\ref{tab:cpu:Cache Geometry for 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}
-on page~\pageref{tab:cpu:Cache Geometry for 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}.
+\cref{tab:cpu:Cache Geometry for 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}
+on \cpageref{tab:cpu:Cache Geometry for 8-Socket System With Intel Xeon Platinum 8176 CPUs @ 2.10GHz}.
 Each hash bucket (\co{struct ht_bucket}) occupies 56~bytes and each
 element (\co{struct zoo_he}) occupies 72~bytes for the RCU and QSBR runs.
 The benchmark generating
-Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR and Unsynchronized; Linear Scale}
+\cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo including QSBR and Unsynchronized; Linear Scale}
 used 262,144 buckets and up to 262,144 elements, for a total of
 33,554,448~bytes, which not only overflows the 1,048,576-byte L2 caches
 by more than a factor of thirty, but is also uncomfortably close to the
@@ -681,8 +681,8 @@ to about half again faster than that of either QSBR or RCU\@.
 \QuickQuiz{
 	How can we be so sure that the hash-table size is at fault here,
 	especially given that
-	Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
-	on page~\pageref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
+	\cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
+	on \cpageref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
 	shows that varying hash-table size has almost
 	no effect?
 	Might the problem instead be something like false sharing?
@@ -703,7 +703,7 @@ to about half again faster than that of either QSBR or RCU\@.
 
 	Still unconvinced?
 	Then look at the log-log plot in
-	Figure~\ref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
+	\cref{fig:datastruct:Read-Only RCU-Protected Hash-Table Performance For Schr\"odinger's Zoo at 448 CPUs; Varying Table Size},
 	which shows performance for 448 CPUs as a function of the
 	hash-table size, that is, number of buckets and maximum number
 	of elements.
@@ -719,7 +719,7 @@ to about half again faster than that of either QSBR or RCU\@.
 	This near-ideal performance is consistent with that for the
 	pre-BSD routing table shown in
 	\cref{fig:defer:Pre-BSD Routing Table Protected by RCU}
-	on page~\pageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
+	on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU},
 	even at 448 CPUs.
 	However, the performance drops significantly (this is a log-log
 	plot) at about 8,000~elements, which is where the 1,048,576-byte
@@ -734,8 +734,8 @@ to about half again faster than that of either QSBR or RCU\@.
 	a factor of 25.
 
 	The reason that
-	Figure~\ref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
-	on page~\pageref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
+	\cref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
+	on \cpageref{fig:datastruct:Read-Only Hash-Table Performance For Schroedinger's Zoo; Varying Buckets}
 	shows little effect is that its data was gathered from
 	bucket-locked hash tables, where locking overhead and contention
 	drowned out cache-capacity effects.
@@ -757,7 +757,7 @@ to about half again faster than that of either QSBR or RCU\@.
 
 What if the memory footprint is reduced still further?
 \Cref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR}
-on page~\pageref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR With Non-Initial rcu-head}
+on \cpageref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR With Non-Initial rcu-head}
 shows that RCU attains very nearly ideal performance on the much smaller
 data structure represented by the pre-BSD routing table.
 
@@ -799,7 +799,7 @@ data structure represented by the pre-BSD routing table.
 As noted earlier, Schr\"odinger is surprised by the popularity of his
 cat~\cite{ErwinSchroedinger1935Cat}, but recognizes the need to reflect
 this popularity in his design.
-Figure~\ref{fig:datastruct:Read-Side Cat-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo at 64 CPUs}
+\Cref{fig:datastruct:Read-Side Cat-Only RCU-Protected Hash-Table Performance For Schroedinger's Zoo at 64 CPUs}
 shows the results of 64-CPU runs, varying the number of CPUs that are
 doing nothing but looking up the cat.
 Both RCU and hazard pointers respond well to this challenge, but bucket
@@ -830,7 +830,7 @@ in point.
 
 If we were only ever going to read the data, we would not need any
 concurrency control to begin with.
-Figure~\ref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
+\Cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
 therefore shows the effect of updates on readers.
 At the extreme left-hand side of this graph, all but one of the CPUs
 are doing lookups, while to the right all 448 CPUs are doing updates.
@@ -854,9 +854,9 @@ execution, greatly reducing memory-barrier overhead in the read-only case.
 \end{figure}
 
 Where
-Figure~\ref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
+\cref{fig:datastruct:Read-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo in the Presence of Updates}
 showed the effect of increasing update rates on lookups,
-Figure~\ref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
+\cref{fig:datastruct:Update-Side RCU-Protected Hash-Table Performance For Schroedinger's Zoo}
 shows the effect of increasing update rates on the updates themselves.
 Again, at the left-hand side of the figure all but one of the CPUs are
 doing lookups and at the right-hand side of the figure all 448 CPUs are
@@ -885,7 +885,7 @@ not recommended for production use.
 \QuickQuiz{
 	The dangers of extrapolating from 28 CPUs to 448 CPUs was
 	made quite clear in
-	Section~\ref{sec:datastruct:Hash-Table Performance}.
+	\cref{sec:datastruct:Hash-Table Performance}.
 	But why should extrapolating up from 448 CPUs be any safer?
 }\QuickQuizAnswer{
 	In theory, it isn't any safer, and a useful exercise would be
@@ -943,7 +943,7 @@ minute.
 In this case, the two veterinarians would disagree on the state of the
 cat for the second period of thirty seconds following the last heartbeat,
 as fancifully depicted in
-Figure~\ref{fig:datastruct:Even Veterinarians Disagree}.
+\cref{fig:datastruct:Even Veterinarians Disagree}.
 
 \pplsur{Weiner}{Heisenberg} taught us to live with this sort of
 uncertainty~\cite{WeinerHeisenberg1927Uncertain}, which is a good
@@ -959,9 +959,9 @@ Furthermore, most computing systems are intended to interact with
 the outside world.
 Consistency with the outside world is therefore of paramount importance.
 However, as we saw in
-Figure~\ref{fig:defer:Response Time of RCU vs. Reader-Writer Locking}
+\cref{fig:defer:Response Time of RCU vs. Reader-Writer Locking}
 on
-page~\pageref{fig:defer:Response Time of RCU vs. Reader-Writer Locking},
+\cpageref{fig:defer:Response Time of RCU vs. Reader-Writer Locking},
 increased internal consistency can come at the expense of degraded
 external consistency.
 Techniques such as RCU and hazard pointers give up some degree of
@@ -990,7 +990,7 @@ or all of the above.
 Fixed-size hash tables are perfectly partitionable, but resizable hash
 tables pose partitioning challenges when growing or shrinking, as
 fancifully depicted in
-Figure~\ref{fig:datastruct:Partitioning Problems}.
+\cref{fig:datastruct:Partitioning Problems}.
 However, it turns out that it is possible to construct high-performance
 scalable RCU-protected hash tables, as described in the following sections.
 
@@ -1004,7 +1004,7 @@ The first (and simplest) was developed for the Linux kernel by
 \ppl{Herbert}{Xu}~\cite{HerbertXu2010RCUResizeHash}, and is described in the
 following sections.
 The other two are covered briefly in
-Section~\ref{sec:datastruct:Other Resizable Hash Tables}.
+\cref{sec:datastruct:Other Resizable Hash Tables}.
 
 The key insight behind the first hash-table implementation is that
 each data element can have two sets of list pointers, with one set
@@ -1073,7 +1073,7 @@ which is the subject of the next section.
 Resizing is accomplished by the classic approach of inserting a level
 of indirection, in this case, the \co{ht} structure shown on
 \clnrefrange{ht:b}{ht:e} of
-Listing~\ref{lst:datastruct:Resizable Hash-Table Data Structures}
+\cref{lst:datastruct:Resizable Hash-Table Data Structures}
 (\path{hash_resize.c}).
 The \co{hashtab} structure shown on
 \clnrefrange{hashtab:b}{hashtab:e} contains only a
@@ -1092,17 +1092,17 @@ we should be able to make good use of RCU\@.
 \end{listing}
 
 The \co{ht} structure represents a specific size of the hash table,
-as specified by the \co{->ht_nbuckets} field on line~\lnref{ht:nbuckets}.
+as specified by the \co{->ht_nbuckets} field on \clnref{ht:nbuckets}.
 The size is stored in the same structure containing the array of
 buckets (\co{->ht_bkt[]} on
-line~\lnref{ht:bkt}) in order to avoid mismatches between
+\clnref{ht:bkt}) in order to avoid mismatches between
 the size and the array.
 The \co{->ht_resize_cur} field on
-line~\lnref{ht:resize_cur} is equal to $-1$ unless a resize
+\clnref{ht:resize_cur} is equal to $-1$ unless a resize
 operation
 is in progress, in which case it indicates the index of the bucket whose
 elements are being inserted into the new hash table, which is referenced
-by the \co{->ht_new} field on line~\lnref{ht:new}.
+by the \co{->ht_new} field on \clnref{ht:new}.
 If there is no resize operation in progress, \co{->ht_new} is \co{NULL}.
 Thus, a resize operation proceeds by allocating a new \co{ht} structure
 and referencing it via the \co{->ht_new} pointer, then advancing
@@ -1113,10 +1113,10 @@ Once all old readers have completed, the old hash table's \co{ht} structure
 may be freed.
 
 The \co{->ht_idx} field on
-line~\lnref{ht:idx} indicates which of the two sets of
+\clnref{ht:idx} indicates which of the two sets of
 list pointers are being used by this instantiation of the hash table,
 and is used to index the \co{->hte_next[]} array in the \co{ht_elem}
-structure on line~\lnref{ht_elem:next}.
+structure on \clnref{ht_elem:next}.
 
 The \co{->ht_cmp()}, \co{->ht_gethash()}, and \co{->ht_getkey()} fields on
 \clnrefrange{ht:cmp}{ht:getkey}
@@ -1158,7 +1158,7 @@ the old table.
 
 \begin{fcvref}[ln:datastruct:hash_resize:get_bucket]
 Bucket selection is shown in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection},
+\cref{lst:datastruct:Resizable Hash-Table Bucket Selection},
 which shows \co{ht_get_bucket()} on
 \clnrefrange{single:b}{single:e} and \co{ht_search_bucket()} on
 \clnrefrange{hsb:b}{hsb:e}.
@@ -1167,26 +1167,26 @@ corresponding to the specified key in the specified hash table, without
 making any allowances for resizing.
 It also stores the bucket index corresponding to the key into the location
 referenced by parameter~\co{b} on
-line~\lnref{single:gethash}, and the corresponding
+\clnref{single:gethash}, and the corresponding
 hash value corresponding to the key into the location
-referenced by parameter~\co{h} (if non-\co{NULL}) on line~\lnref{single:h}.
-Line~\lnref{single:return} then returns a reference to the corresponding bucket.
+referenced by parameter~\co{h} (if non-\co{NULL}) on \clnref{single:h}.
+\Clnref{single:return} then returns a reference to the corresponding bucket.
 
 The \co{ht_search_bucket()} function searches for the specified key
 within the specified hash-table version.
-Line~\lnref{hsb:get_curbkt} obtains a reference to the bucket corresponding
+\Clnref{hsb:get_curbkt} obtains a reference to the bucket corresponding
 to the specified key.
 The loop spanning \clnrefrange{hsb:loop:b}{hsb:loop:e} searches
-that bucket, so that if line~\lnref{hsb:match} detects a match,
-line~\lnref{hsb:ret_match} returns a pointer to the enclosing data element.
+that bucket, so that if \clnref{hsb:match} detects a match,
+\clnref{hsb:ret_match} returns a pointer to the enclosing data element.
 Otherwise, if there is no match,
-line~\lnref{hsb:ret_NULL} returns \co{NULL} to indicate
+\clnref{hsb:ret_NULL} returns \co{NULL} to indicate
 failure.
 \end{fcvref}
 
 \QuickQuiz{
 	How does the code in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Bucket Selection}
+	\cref{lst:datastruct:Resizable Hash-Table Bucket Selection}
 	protect against the resizing process progressing past the
 	selected bucket?
 }\QuickQuizAnswer{
@@ -1206,33 +1206,33 @@ operation.
 \end{listing}
 
 Read-side concurrency control is provided by RCU as was shown in
-Listing~\ref{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Control},
+\cref{lst:datastruct:RCU-Protected Hash-Table Read-Side Concurrency Control},
 but the update-side concurrency\-/control functions
 \co{hashtab_lock_mod()} and \co{hashtab_unlock_mod()}
 must now deal with the possibility of a
 concurrent resize operation as shown in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control}.
+\cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency Control}.
 
 \begin{fcvref}[ln:datastruct:hash_resize:lock_unlock_mod:l]
 The \co{hashtab_lock_mod()} spans
 \clnrefrange{b}{e} in the listing.
-Line~\lnref{rcu_lock} enters an RCU read-side critical section to prevent
+\Clnref{rcu_lock} enters an RCU read-side critical section to prevent
 the data structures from being freed during the traversal,
-line~\lnref{refhashtbl} acquires a reference to the current hash table, and then
-line~\lnref{refbucket} obtains a reference to the bucket in this hash table
+\clnref{refhashtbl} acquires a reference to the current hash table, and then
+\clnref{refbucket} obtains a reference to the bucket in this hash table
 corresponding to the key.
-Line~\lnref{acq_bucket} acquires that bucket's lock, which will prevent any concurrent
+\Clnref{acq_bucket} acquires that bucket's lock, which will prevent any concurrent
 resizing operation from distributing that bucket, though of course it
 will have no effect if that bucket has already been distributed.
 \Clnrefrange{lsp0b}{lsp0e} store the bucket pointer and
 pointer-set index into their respective fields in the
 \co{ht_lock_state} structure, which communicates the information to
 \co{hashtab_add()}, \co{hashtab_del()}, and \co{hashtab_unlock_mod()}.
-Line~\lnref{ifresized} then checks to see if a concurrent resize
+\Clnref{ifresized} then checks to see if a concurrent resize
 operation has already distributed this bucket across the new hash table,
-and if not, line~\lnref{lsp1_1} indicates that there is no
+and if not, \clnref{lsp1_1} indicates that there is no
 already-resized hash bucket and
-line~\lnref{fastret1} returns with the selected hash bucket's
+\clnref{fastret1} returns with the selected hash bucket's
 lock held (thus preventing a concurrent resize operation from distributing
 this bucket) and also within an RCU read-side critical section.
 \IX{Deadlock} is avoided because the old table's locks are always acquired
@@ -1241,9 +1241,9 @@ than two versions from existing at a given time, thus preventing a
 deadlock cycle.
 
 Otherwise, a concurrent resize operation has already distributed this
-bucket, so line~\lnref{new_hashtbl} proceeds to the new hash table,
-line~\lnref{get_newbkt} selects the bucket corresponding to the key,
-and line~\lnref{acq_newbkt} acquires the bucket's lock.
+bucket, so \clnref{new_hashtbl} proceeds to the new hash table,
+\clnref{get_newbkt} selects the bucket corresponding to the key,
+and \clnref{acq_newbkt} acquires the bucket's lock.
 \Clnrefrange{lsp1b}{lsp1e} store the bucket pointer and
 pointer-set index into their respective fields in the
 \co{ht_lock_state} structure, which again communicates this information to
@@ -1262,11 +1262,11 @@ section.
 \begin{fcvref}[ln:datastruct:hash_resize:lock_unlock_mod:ul]
 The \co{hashtab_unlock_mod()} function releases the lock(s) acquired by
 \co{hashtab_lock_mod()}.
-Line~\lnref{relbkt0} releases the lock on the old \co{ht_bucket} structure.
-In the unlikely event that line~\lnref{ifbkt1} determines that a resize
-operation is in progress, line~\lnref{relbkt1} releases the lock on the
+\Clnref{relbkt0} releases the lock on the old \co{ht_bucket} structure.
+In the unlikely event that \clnref{ifbkt1} determines that a resize
+operation is in progress, \clnref{relbkt1} releases the lock on the
 new \co{ht_bucket} structure.
-Either way, line~\lnref{rcu_unlock} exits the RCU read-side critical
+Either way, \clnref{rcu_unlock} exits the RCU read-side critical
 section.
 \end{fcvref}
 
@@ -1298,23 +1298,23 @@ Now that we have bucket selection and concurrency control in place,
 we are ready to search and update our resizable hash table.
 The \co{hashtab_lookup()}, \co{hashtab_add()}, and \co{hashtab_del()}
 functions are shown in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}.
+\cref{lst:datastruct:Resizable Hash-Table Access Functions}.
 
 \begin{fcvref}[ln:datastruct:hash_resize:access:lkp]
 The \co{hashtab_lookup()} function on
 \clnrefrange{b}{e} of the listing does
 hash lookups.
-Line~\lnref{get_curtbl} fetches the current hash table and
-line~\lnref{get_curbkt} searches the bucket corresponding to the
+\Clnref{get_curtbl} fetches the current hash table and
+\clnref{get_curbkt} searches the bucket corresponding to the
 specified key.
-Line~\lnref{ret} returns a pointer to the searched-for element
+\Clnref{ret} returns a pointer to the searched-for element
 or \co{NULL} when the search fails.
 The caller must be within an RCU read-side critical section.
 \end{fcvref}
 
 \QuickQuiz{
 	The \co{hashtab_lookup()} function in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
+	\cref{lst:datastruct:Resizable Hash-Table Access Functions}
 	ignores concurrent resize operations.
 	Doesn't this mean that readers might miss an element that was
 	previously added during a resize operation?
@@ -1329,12 +1329,12 @@ The caller must be within an RCU read-side critical section.
 \begin{fcvref}[ln:datastruct:hash_resize:access:add]
 The \co{hashtab_add()} function on \clnrefrange{b}{e} of the listing adds
 new data elements to the hash table.
-Line~\lnref{htbp} picks up the current \co{ht_bucket} structure into which the
-new element is to be added, and line~\lnref{i} picks up the index of
+\Clnref{htbp} picks up the current \co{ht_bucket} structure into which the
+new element is to be added, and \clnref{i} picks up the index of
 the pointer pair.
-Line~\lnref{add} adds the new element to the current hash bucket.
-If line~\lnref{ifnew} determines that this bucket has been distributed
-to a new version of the hash table, then line~\lnref{addnew} also adds the
+\Clnref{add} adds the new element to the current hash bucket.
+If \clnref{ifnew} determines that this bucket has been distributed
+to a new version of the hash table, then \clnref{addnew} also adds the
 new element to the corresponding new bucket.
 The caller is required to handle concurrency, for example, by invoking
 \co{hashtab_lock_mod()} before the call to \co{hashtab_add()} and invoking
@@ -1345,10 +1345,10 @@ The caller is required to handle concurrency, for example, by invoking
 The \co{hashtab_del()} function on
 \clnrefrange{b}{e} of the listing removes
 an existing element from the hash table.
-Line~\lnref{i} picks up the index of the pointer pair
-and line~\lnref{del} removes the specified element from the current table.
-If line~\lnref{ifnew} determines that this bucket has been distributed
-to a new version of the hash table, then line~\lnref{delnew} also removes
+\Clnref{i} picks up the index of the pointer pair
+and \clnref{del} removes the specified element from the current table.
+If \clnref{ifnew} determines that this bucket has been distributed
+to a new version of the hash table, then \clnref{delnew} also removes
 the specified element from the corresponding new bucket.
 As with \co{hashtab_add()}, the caller is responsible for concurrency
 control and this concurrency control suffices for synchronizing with
@@ -1357,7 +1357,7 @@ a concurrent resize operation.
 
 \QuickQuiz{
 	The \co{hashtab_add()} and \co{hashtab_del()} functions in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Access Functions}
+	\cref{lst:datastruct:Resizable Hash-Table Access Functions}
 	can update two hash buckets while a resize operation is progressing.
 	This might cause poor performance if the frequency of resize operation
 	is not negligible.
@@ -1366,8 +1366,8 @@ a concurrent resize operation.
 	Yes, at least assuming that a slight increase in the cost of
 	\co{hashtab_lookup()} is acceptable.
 	One approach is shown in
-	Listings~\ref{lst:datastruct:Resizable Hash-Table Access Functions (Fewer Updates)}
-	and~\ref{lst:datastruct:Resizable Hash-Table Update-Side Locking Function (Fewer Updates)}
+	\cref{lst:datastruct:Resizable Hash-Table Access Functions (Fewer Updates),%
+	lst:datastruct:Resizable Hash-Table Update-Side Locking Function (Fewer Updates)}
 	(\path{hash_resize_s.c}).
 
 \begin{listing}
@@ -1407,42 +1407,42 @@ a concurrent resize operation.
 
 \begin{fcvref}[ln:datastruct:hash_resize:resize]
 The actual resizing itself is carried out by \co{hashtab_resize}, shown in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Resizing} on
-page~\pageref{lst:datastruct:Resizable Hash-Table Resizing}.
-Line~\lnref{trylock} conditionally acquires the top-level \co{->ht_lock}, and if
-this acquisition fails, line~\lnref{ret_busy} returns \co{-EBUSY} to indicate that
+\cref{lst:datastruct:Resizable Hash-Table Resizing} on
+\cpageref{lst:datastruct:Resizable Hash-Table Resizing}.
+\Clnref{trylock} conditionally acquires the top-level \co{->ht_lock}, and if
+this acquisition fails, \clnref{ret_busy} returns \co{-EBUSY} to indicate that
 a resize is already in progress.
-Otherwise, line~\lnref{get_curtbl} picks up a reference to the current hash table,
+Otherwise, \clnref{get_curtbl} picks up a reference to the current hash table,
 and \clnrefrange{alloc:b}{alloc:e} allocate a new hash table of the desired size.
 If a new set of hash/key functions have been specified, these are
 used for the new table, otherwise those of the old table are preserved.
-If line~\lnref{chk_nomem} detects memory-allocation failure,
-line~\lnref{rel_nomem} releases \co{->ht_lock}
-and line~\lnref{ret_nomem} returns a failure indication.
+If \clnref{chk_nomem} detects memory-allocation failure,
+\clnref{rel_nomem} releases \co{->ht_lock}
+and \clnref{ret_nomem} returns a failure indication.
 
-Line~\lnref{get_curidx} picks up the current table's index and
-line~\lnref{put_curidx} stores its inverse to
+\Clnref{get_curidx} picks up the current table's index and
+\clnref{put_curidx} stores its inverse to
 the new hash table, thus ensuring that the two hash tables avoid overwriting
 each other's linked lists.
-Line~\lnref{set_newtbl} then starts the bucket-distribution process by
+\Clnref{set_newtbl} then starts the bucket-distribution process by
 installing a reference to the new table into the \co{->ht_new} field of
 the old table.
-Line~\lnref{sync_rcu} ensures that all readers who are not aware of the
+\Clnref{sync_rcu} ensures that all readers who are not aware of the
 new table complete before the resize operation continues.
 
 Each pass through the loop spanning \clnrefrange{loop:b}{loop:e} distributes the contents
 of one of the old hash table's buckets into the new hash table.
-Line~\lnref{get_oldcur} picks up a reference to the old table's current bucket
-and line~\lnref{acq_oldcur} acquires that bucket's spinlock.
+\Clnref{get_oldcur} picks up a reference to the old table's current bucket
+and \clnref{acq_oldcur} acquires that bucket's spinlock.
 \end{fcvref}
 
 \QuickQuiz{
 	\begin{fcvref}[ln:datastruct:hash_resize:resize]
 	In the \co{hashtab_resize()} function in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Resizing},
-	what guarantees that the update to \co{->ht_new} on line~\lnref{set_newtbl}
+	\cref{lst:datastruct:Resizable Hash-Table Resizing},
+	what guarantees that the update to \co{->ht_new} on \clnref{set_newtbl}
 	will be seen as happening before the update to \co{->ht_resize_cur}
-	on line~\lnref{update_resize} from the perspective of
+	on \clnref{update_resize} from the perspective of
 	\co{hashtab_add()} and \co{hashtab_del()}?
 	In other words, what prevents \co{hashtab_add()}
 	and \co{hashtab_del()} from dereferencing
@@ -1450,12 +1450,12 @@ and line~\lnref{acq_oldcur} acquires that bucket's spinlock.
 	\end{fcvref}
 }\QuickQuizAnswer{
 	\begin{fcvref}[ln:datastruct:hash_resize:resize]
-	The \co{synchronize_rcu()} on line~\lnref{sync_rcu} of
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Resizing}
+	The \co{synchronize_rcu()} on \clnref{sync_rcu} of
+	\cref{lst:datastruct:Resizable Hash-Table Resizing}
 	ensures that all pre-existing RCU readers have completed between
 	the time that we install the new hash-table reference on
-	line~\lnref{set_newtbl} and the time that we update \co{->ht_resize_cur} on
-	line~\lnref{update_resize}.
+	\clnref{set_newtbl} and the time that we update \co{->ht_resize_cur} on
+	\clnref{update_resize}.
 	This means that any reader that sees a non-negative value
 	of \co{->ht_resize_cur} cannot have started before the
 	assignment to \co{->ht_new}, and thus must be able to see
@@ -1465,7 +1465,7 @@ and line~\lnref{acq_oldcur} acquires that bucket's spinlock.
 	\co{hashtab_del()} functions must be enclosed
 	in RCU read-side critical sections, courtesy of
 	\co{hashtab_lock_mod()} and \co{hashtab_unlock_mod()} in
-	Listing~\ref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency
+	\cref{lst:datastruct:Resizable Hash-Table Update-Side Concurrency
 	Control}.
 	\end{fcvref}
 }\QuickQuizEnd
@@ -1475,30 +1475,30 @@ Each pass through the loop spanning
 \clnrefrange{loop_list:b}{loop_list:e} adds one data element
 from the current old-table bucket to the corresponding new-table bucket,
 holding the new-table bucket's lock during the add operation.
-Line~\lnref{update_resize} updates
+\Clnref{update_resize} updates
 \co{->ht_resize_cur} to indicate that this bucket has been distributed.
-Finally, line~\lnref{rel_oldcur} releases the old-table bucket lock.
+Finally, \clnref{rel_oldcur} releases the old-table bucket lock.
 
-Execution reaches line~\lnref{rcu_assign} once all old-table buckets have been distributed
+Execution reaches \clnref{rcu_assign} once all old-table buckets have been distributed
 across the new table.
-Line~\lnref{rcu_assign} installs the newly created table as the current one, and
-line~\lnref{sync_rcu_2} waits for all old readers (who might still be referencing
+\Clnref{rcu_assign} installs the newly created table as the current one, and
+\clnref{sync_rcu_2} waits for all old readers (who might still be referencing
 the old table) to complete.
-Then line~\lnref{rel_master} releases the resize-serialization lock,
-line~\lnref{free} frees
-the old hash table, and finally line~\lnref{ret_success} returns success.
+Then \clnref{rel_master} releases the resize-serialization lock,
+\clnref{free} frees
+the old hash table, and finally \clnref{ret_success} returns success.
 \end{fcvref}
 
 \QuickQuiz{
 	\begin{fcvref}[ln:datastruct:hash_resize:resize]
-	Why is there a \co{WRITE_ONCE()} on line~\lnref{update_resize}
-	in Listing~\ref{lst:datastruct:Resizable Hash-Table Resizing}?
+	Why is there a \co{WRITE_ONCE()} on \clnref{update_resize}
+	in \cref{lst:datastruct:Resizable Hash-Table Resizing}?
 	\end{fcvref}
 }\QuickQuizAnswer{
 	\begin{fcvref}[ln:datastruct:hash_resize:lock_unlock_mod]
 	Together with the \co{READ_ONCE()}
-	on line~\lnref{l:ifresized} in \co{hashtab_lock_mod()}
-	of Listing~\ref{lst:datastruct:Resizable Hash-Table Update-Side
+	on \clnref{l:ifresized} in \co{hashtab_lock_mod()}
+	of \cref{lst:datastruct:Resizable Hash-Table Update-Side
 	Concurrency Control},
 	it tells the compiler that the non-initialization accesses
 	to \co{->ht_resize_cur} must remain because reads
@@ -1518,7 +1518,7 @@ the old hash table, and finally line~\lnref{ret_success} returns success.
 \end{figure}
 % Data from CodeSamples/datastruct/hash/data/hps.resize.2020.09.05a
 
-Figure~\ref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
+\Cref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
 compares resizing hash tables to their fixed-sized counterparts
 for 262,144 and 2,097,152 elements in the hash table.
 The figure shows three traces for each element count, one
@@ -1549,7 +1549,7 @@ can only be expected to produce a sharp decrease in performance,
 as in fact is shown in the graph.
 But worse yet, the hash-table elements occupy 128\,MB, which overflows
 each socket's 39\,MB L3 cache, with performance consequences analogous
-to those described in Section~\ref{sec:cpu:Costs of Operations}.
+to those described in \cref{sec:cpu:Costs of Operations}.
 The resulting cache overflow means that the memory system is involved
 even for a read-only benchmark, and as you can see from the sublinear
 portions of the lower three traces, the memory system can be a serious
@@ -1558,7 +1558,7 @@ bottleneck.
 \QuickQuiz{
 	How much of the difference in performance between the large and
 	small hash tables shown in
-	Figure~\ref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
+	\cref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
 	was due to long hash chains and how much was due to
 	memory-system bottlenecks?
 }\QuickQuizAnswer{
@@ -1579,8 +1579,8 @@ bottleneck.
 	the middle of
 	\cref{fig:datastruct:Effect of Memory-System Bottlenecks on Hash Tables}.
 	The other six traces are identical to their counterparts in
-	Figure~\ref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
-	on page~\pageref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}.
+	\cref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}
+	on \cpageref{fig:datastruct:Overhead of Resizing Hash Tables Between 262;144 and 524;288 Buckets vs. Total Number of Elements}.
 	The gap between this new trace and the lower set of three
 	traces is a rough measure of how much of the difference in
 	performance was due to hash-chain length, and the gap between
@@ -1621,7 +1621,7 @@ bottleneck.
 }\QuickQuizEnd
 
 Referring to the last column of
-Table~\ref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz},
+\cref{tab:cpu:CPU 0 View of Synchronization Mechanisms on 8-Socket System With Intel Xeon Platinum 8176 CPUs at 2.10GHz},
 we recall that the first 28~CPUs are in the first socket, on a
 one-CPU-per-core basis, which explains the sharp decrease in performance
 of the resizable hash table beyond 28~CPUs.
@@ -1690,7 +1690,7 @@ extraneous data element due to key mismatches.
 
 The process of shrinking a relativistic hash table by a factor of two
 is shown in
-Figure~\ref{fig:datastruct:Shrinking a Relativistic Hash Table},
+\cref{fig:datastruct:Shrinking a Relativistic Hash Table},
 in this case shrinking a two-bucket hash table into a one-bucket
 hash table, otherwise known as a linear list.
 This process works by coalescing pairs of buckets in the old larger hash
@@ -1746,7 +1746,7 @@ state~(f).
 
 Growing a relativistic hash table reverses the shrinking process,
 but requires more grace-period steps, as shown in
-Figure~\ref{fig:datastruct:Growing a Relativistic Hash Table}.
+\cref{fig:datastruct:Growing a Relativistic Hash Table}.
 The initial state~(a) is at the top of this figure, with time advancing
 from top to bottom.
 
@@ -1814,12 +1814,12 @@ library~\cite{MathieuDesnoyers2009URCU}.
 
 The preceding sections have focused on data structures that enhance
 concurrency due to partitionability
-(Section~\ref{sec:datastruct:Partitionable Data Structures}),
+(\cref{sec:datastruct:Partitionable Data Structures}),
 efficient handling of read-mostly access patterns
-(Section~\ref{sec:datastruct:Read-Mostly Data Structures}),
+(\cref{sec:datastruct:Read-Mostly Data Structures}),
 or application of read-mostly techniques to avoid
 non-partitionability
-(Section~\ref{sec:datastruct:Non-Partitionable Data Structures}).
+(\cref{sec:datastruct:Non-Partitionable Data Structures}).
 This section gives a brief review of other data structures.
 
 One of the hash table's greatest advantages for parallel use is that it
@@ -1870,7 +1870,7 @@ represents an early academic use of a technique resembling
 RCU~\cite{Pugh90}.
 
 Concurrent double-ended queues were discussed in
-Section~\ref{sec:SMPdesign:Double-Ended Queue},
+\cref{sec:SMPdesign:Double-Ended Queue},
 and concurrent stacks and queues have a long history~\cite{Treiber86},
 though not normally the most impressive performance or scalability.
 They are nevertheless a common feature of concurrent
@@ -1907,7 +1907,7 @@ alone to the set of CPU families in common use today.
 \label{sec:datastruct:Specialization}
 
 The resizable hash table presented in
-Section~\ref{sec:datastruct:Non-Partitionable Data Structures}
+\cref{sec:datastruct:Non-Partitionable Data Structures}
 used an opaque type for the key.
 This allows great flexibility, permitting any sort of key to be
 used, but it also incurs significant overhead due to the calls via
@@ -1924,8 +1924,8 @@ This overhead can be eliminated by specializing a hash-table implementation
 to a given key type and hash function, for example, by using C++ templates.
 Doing so eliminates the \co{->ht_cmp()}, \co{->ht_gethash()}, and
 \co{->ht_getkey()} function pointers in the \co{ht} structure shown in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Data Structures} on
-page~\pageref{lst:datastruct:Resizable Hash-Table Data Structures}.
+\cref{lst:datastruct:Resizable Hash-Table Data Structures} on
+\cpageref{lst:datastruct:Resizable Hash-Table Data Structures}.
 It also eliminates the corresponding calls through these pointers,
 which could allow the compiler to inline the resulting fixed functions,
 eliminating not only the overhead of the call instruction, but the
@@ -1980,8 +1980,8 @@ days of four-kilobyte address spaces.
 The hash tables discussed in this chapter made almost no attempt to conserve
 memory.
 For example, the \co{->ht_idx} field in the \co{ht} structure in
-Listing~\ref{lst:datastruct:Resizable Hash-Table Data Structures} on
-page~\pageref{lst:datastruct:Resizable Hash-Table Data Structures}
+\cref{lst:datastruct:Resizable Hash-Table Data Structures} on
+\cpageref{lst:datastruct:Resizable Hash-Table Data Structures}
 always has a value of either zero or one, yet takes up a full 32 bits
 of memory.
 It could be eliminated, for example, by stealing a bit from the
@@ -2023,11 +2023,11 @@ Despite these disadvantages, bit-spinlocks are extremely useful when
 memory is at a premium.
 
 One aspect of the second opportunity was covered in
-Section~\ref{sec:datastruct:Other Resizable Hash Tables},
+\cref{sec:datastruct:Other Resizable Hash Tables},
 which presented resizable hash tables that require only one
 set of bucket-list pointers in place of the pair of sets required
 by the resizable hash table presented in
-Section~\ref{sec:datastruct:Non-Partitionable Data Structures}.
+\cref{sec:datastruct:Non-Partitionable Data Structures}.
 Another approach would be to use singly linked bucket lists in
 place of the doubly linked lists used in this chapter.
 One downside of this approach is that deletion would then require
@@ -2052,7 +2052,7 @@ Modern computers typically move data between CPUs and main memory in
 fixed-sized blocks that range in size from 32 bytes to 256 bytes.
 These blocks are called \emph{\IXpl{cache line}}, and are extremely important
 to high performance and scalability, as was discussed in
-Section~\ref{sec:cpu:Overheads}.
+\cref{sec:cpu:Overheads}.
 One timeworn way to kill both performance and scalability is to
 place incompatible variables into the same cacheline.
 For example, suppose that a resizable hash table data element had
@@ -2076,7 +2076,7 @@ struct hash_elem {
 \end{listing}
 
 One way to solve this problem on systems with 64-byte cache line is shown in
-Listing~\ref{lst:datastruct:Alignment for 64-Byte Cache Lines}.
+\cref{lst:datastruct:Alignment for 64-Byte Cache Lines}.
 Here \GCC's \co{aligned} attribute is used to force the \co{->counter}
 and the \co{ht_elem} structure into separate cache lines.
 This would allow CPUs to traverse the hash bucket list at full speed
@@ -2120,10 +2120,10 @@ cache geometry:
 	or task.
 	We saw several very effective examples of this rule of thumb
 	in the counter implementations in
-	Chapter~\ref{chp:Counting}.
+	\cref{chp:Counting}.
 \item	Going one step further, partition your data on a per-CPU,
 	per-thread, or per-task basis, as was discussed in
-	Chapter~\ref{chp:Data Ownership}.
+	\cref{chp:Data Ownership}.
 \end{enumerate}
 
 There has been some work towards automated trace-based rearrangement
@@ -2167,7 +2167,7 @@ to a given situation.
 
 This chapter has focused primarily on hash tables, including resizable
 hash tables, which are not fully partitionable.
-Section~\ref{sec:datastruct:Other Data Structures} gave a quick
+\Cref{sec:datastruct:Other Data Structures} gave a quick
 overview of a few non-hash-table data structures.
 Nevertheless, this exposition of hash tables is an excellent introduction
 to the many issues surrounding high-performance scalable data access,
-- 
2.17.1