[PATCH -perfbook 3/3] defer: Employ \cref{} and its variants, take two

Akira Yokosawa <akiyks@xxxxxxxxx> · Sun, 9 May 2021 20:10:25 +0900

Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx>
---
 defer/rcu.tex            |  6 ++--
 defer/rcufundamental.tex | 74 ++++++++++++++++++++--------------------
 defer/rcuintro.tex       | 74 ++++++++++++++++++++--------------------
 3 files changed, 77 insertions(+), 77 deletions(-)

diff --git a/defer/rcu.tex b/defer/rcu.tex
index 979a5b28..f6fa6029 100644
--- a/defer/rcu.tex
+++ b/defer/rcu.tex
@@ -11,11 +11,11 @@ All of the mechanisms discussed in the preceding sections
 used one of a number of approaches to defer specific actions
 until they may be carried out safely.
 The reference counters discussed in
-Section~\ref{sec:defer:Reference Counting}
+\cref{sec:defer:Reference Counting}
 use explicit counters to defer actions that could disturb readers,
 which results in read-side contention and thus poor scalability.
 The hazard pointers covered by
-Section~\ref{sec:defer:Hazard Pointers}
+\cref{sec:defer:Hazard Pointers}
 uses implicit counters in the guise of per-thread lists of pointer.
 This avoids read-side contention, but requires readers to do stores and
 conditional branches, as well as either full memory barriers in read-side
@@ -27,7 +27,7 @@ update-side primitives.\footnote{
 	open-source library
 	(\url{https://github.com/facebook/folly}).}
 The sequence lock presented in
-Section~\ref{sec:defer:Sequence Locks}
+\cref{sec:defer:Sequence Locks}
 also avoids read-side contention, but does not protect pointer
 traversals and, like hazard pointers, requires either full memory barriers
 in read-side primitives, or inter-processor interrupts in update-side
diff --git a/defer/rcufundamental.tex b/defer/rcufundamental.tex
index 00b80d63..9ab6e98d 100644
--- a/defer/rcufundamental.tex
+++ b/defer/rcufundamental.tex
@@ -14,16 +14,16 @@ wish to skip the underlying fundamentals presented in this section.
 RCU is made up of three fundamental mechanisms, the first being
 used for insertion, the second being used for deletion, and the third
 being used to allow readers to tolerate concurrent insertions and deletions.
-Section~\ref{sec:defer:Publish-Subscribe Mechanism}
+\Cref{sec:defer:Publish-Subscribe Mechanism}
 describes the publish-subscribe mechanism used for insertion,
-Section~\ref{sec:defer:Wait For Pre-Existing RCU Readers}
+\cref{sec:defer:Wait For Pre-Existing RCU Readers}
 describes how waiting for pre-existing RCU readers enabled deletion,
 and
-Section~\ref{sec:defer:Maintain Multiple Versions of Recently Updated Objects}
+\cref{sec:defer:Maintain Multiple Versions of Recently Updated Objects}
 discusses how maintaining multiple versions of recently updated objects
 permits concurrent insertions and deletions.
 Finally,
-Section~\ref{sec:defer:Summary of RCU Fundamentals}
+\cref{sec:defer:Summary of RCU Fundamentals}
 summarizes RCU fundamentals.
 
 \subsubsection{Publish-Subscribe Mechanism}
@@ -48,20 +48,20 @@ For their part, updaters can be thought of as publishing new versions.
 \end{figure}
 
 Unfortunately, as laid out in
-Section~\ref{sec:toolsoftrade:Shared-Variable Shenanigans}
+\cref{sec:toolsoftrade:Shared-Variable Shenanigans}
 and reiterated in
-Section~\ref{sec:defer:Minimal Insertion and Deletion},
+\cref{sec:defer:Minimal Insertion and Deletion},
 it is unwise to use \IXplx{plain access}{es} for these publication and subscription
 operations.
 It is instead necessary to inform both the compiler and the CPU
 of the need for care, as can be seen from
-Figure~\ref{fig:defer:Publication/Subscription Constraints},
+\cref{fig:defer:Publication/Subscription Constraints},
 which illustrates interactions between concurrent executions of
 \co{ins_route()} (and its caller) and \co{read_gptr()} from
-Listing~\ref{lst:defer:Insertion and Deletion With Concurrent Readers}.
+\cref{lst:defer:Insertion and Deletion With Concurrent Readers}.
 
 The \co{ins_route()} column from
-Figure~\ref{fig:defer:Publication/Subscription Constraints}
+\cref{fig:defer:Publication/Subscription Constraints}
 shows \co{ins_route()}'s caller allocating a new \co{route} structure,
 which then contains pre-initialization garbage.
 The caller then initializes the newly allocated structure, and then
@@ -95,7 +95,7 @@ mechanism will be required to mediate among multiple concurrent
 In the Linux kernel, locking is the mechanism of choice, but pretty
 much any synchronization mechanism may be used.
 An example of a particularly lightweight synchronization mechanism is
-Chapter~\ref{chp:Data Ownership}'s data ownership: If each pointer is
+\cref{chp:Data Ownership}'s data ownership: If each pointer is
 owned by a particular thread, then that thread may execute
 \co{rcu_assign_pointer()} on that pointer with no additional
 synchronization overhead.
@@ -123,7 +123,7 @@ atomic in the sense that the value loaded must be that from a single
 store, for example, the compiler must not tear the load.\footnote{
 	That is, the compiler must not break the load into multiple
 	smaller loads, as described under ``load tearing'' in
-	Section~\ref{sec:toolsoftrade:Shared-Variable Shenanigans}.}
+	\cref{sec:toolsoftrade:Shared-Variable Shenanigans}.}
 Unfortunately, compiler support for \co{rcu_dereference()} is at best
 a work in progress~\cite{PaulEMcKennneyConsumeP0190R4,PaulEMcKenney2017markconsumeP0462R1,JFBastien2018P0750R1consume}.
 In the meantime, the Linux kernel relies on volatile loads, the details of
@@ -134,7 +134,7 @@ However, on other architectures, \co{rcu_dereference()} typically
 emits a single load instruction, just as would the equivalent single-threaded
 code.
 The coding restrictions are described in more detail in
-Section~\ref{sec:memorder:Address- and Data-Dependency Difficulties},
+\cref{sec:memorder:Address- and Data-Dependency Difficulties},
 however, the common case of field selection (\qtco{->}) works quite well.
 Software that does not require the ultimate in read-side performance
 can instead use C11 acquire loads, which provide the needed ordering and
@@ -145,7 +145,7 @@ will appear in due course.
 In short, use of \co{rcu_assign_pointer()} for publishing pointers and
 use of \co{rcu_dereference()} for subscribing to them successfully avoids the
 ``Not OK'' garbage loads depicted in
-Figure~\ref{fig:defer:Publication/Subscription Constraints}.
+\cref{fig:defer:Publication/Subscription Constraints}.
 These two primitives can therefore be used to add new data to linked
 structures without disrupting concurrent readers.
 
@@ -201,7 +201,7 @@ using explicit tracking.
 In RCU's case, each of the things waited on is called an
 \emph{RCU read-side critical section}.
 As hinted at in
-Section~\ref{sec:defer:Toy Implementation}, an RCU read-side critical
+\cref{sec:defer:Toy Implementation}, an RCU read-side critical
 section starts with an \co{rcu_read_lock()} primitive, and ends with a
 corresponding \co{rcu_read_unlock()} primitive.
 RCU read-side critical sections can be nested, and may contain pretty
@@ -225,7 +225,7 @@ waiting~\cite{MathieuDesnoyers2012URCU,McKenney:2013:SDS:2483852.2483867}.
 
 The relationship between an RCU read-side critical section and a later
 RCU grace period is an if-then relationship, as illustrated by
-Figure~\ref{fig:defer:RCU Reader and Later Grace Period}.
+\cref{fig:defer:RCU Reader and Later Grace Period}.
 If any portion of a given critical section precedes the beginning of
 a given grace period, then RCU guarantees that all of that critical
 section will precede the end of that grace period.
@@ -239,7 +239,7 @@ is guaranteed to also be 0.
 
 \QuickQuiz{
 	What other final values of \co{r1} and \co{r2} are possible in
-	Figure~\ref{fig:defer:RCU Reader and Later Grace Period}?
+	\cref{fig:defer:RCU Reader and Later Grace Period}?
 }\QuickQuizAnswer{
 	The \co{r1 == 0 && r2 == 0} possibility was called out in the text.
 	Given that \co{r1 == 0} implies \co{r2 == 0}, we know that
@@ -257,7 +257,7 @@ is guaranteed to also be 0.
 
 The relationship between an RCU read-side critical section and an earlier
 RCU grace period is also an if-then relationship, as illustrated by
-Figure~\ref{fig:defer:RCU Reader and Earlier Grace Period}.
+\cref{fig:defer:RCU Reader and Earlier Grace Period}.
 If any portion of a given critical section follows the end of
 a given grace period, then RCU guarantees that all of that critical
 section will follow the beginning of that grace period.
@@ -288,7 +288,7 @@ is guaranteed to also be 1.
 \end{figure}
 
 Finally, as shown in
-Figure~\ref{fig:defer:RCU Reader Within Grace Period},
+\cref{fig:defer:RCU Reader Within Grace Period},
 an RCU read-side critical section can be completely overlapped by
 an RCU grace period.
 In this case, \co{r1}'s final value is 1 and \co{r2}'s final value is 0.
@@ -309,9 +309,9 @@ This definition is sufficient for almost all RCU-based algorithms, but
 for those wanting more,
 simple executable formal models of RCU are available
 as part of Linux kernel v4.17 and later, as discussed in
-Section~\ref{sec:formal:Axiomatic Approaches and RCU}.
+\cref{sec:formal:Axiomatic Approaches and RCU}.
 In addition, RCU's ordering properties are examined in much
-greater detail in Section~\ref{sec:memorder:RCU}.
+greater detail in \cref{sec:memorder:RCU}.
 
 \QuickQuiz{
 	What would happen if \co{P0()}'s accesses in
@@ -349,7 +349,7 @@ order the assignment of values to variables as shown in
 {fig:defer:RCU Reader Within Grace Period},
 it is more frequently used to safely free data elements removed from
 a linked structure, as was done in
-Section~\ref{sec:defer:Introduction to RCU}.
+\cref{sec:defer:Introduction to RCU}.
 The general process is illustrated by the following pseudocode:
 
 \begin{enumerate}
@@ -377,7 +377,7 @@ This more abstract procedure requires a more abstract diagram than
 which are specific to a particular litmus test.
 After all, and RCU implementation must work correctly regardless of
 the form of the RCU updates and the RCU read-side critical sections.
-Figure~\ref{fig:defer:Summary of RCU Grace-Period Ordering Guarantees}
+\Cref{fig:defer:Summary of RCU Grace-Period Ordering Guarantees}
 fills this need, showing the four possible scenarios, with time
 advancing from top to bottom within each scenario.
 Within each scenario, an RCU reader is represented by the left-hand
@@ -438,9 +438,9 @@ shows that calls to \co{ins_route()} can also result in concurrent
 readers seeing different versions: Either the initial empty list
 or the newly inserted \co{route} structure.
 Note that both reference counting
-(Section~\ref{sec:defer:Reference Counting})
+(\cref{sec:defer:Reference Counting})
 and hazard pointers
-(Section~\ref{sec:defer:Hazard Pointers})
+(\cref{sec:defer:Hazard Pointers})
 can also cause concurrent readers to see different versions, but
 RCU's lightweight readers make this more likely.
 
@@ -454,11 +454,11 @@ RCU's lightweight readers make this more likely.
 However, maintaining multiple weakly consistent versions can provide
 some surprises.
 For example, consider
-Figure~\ref{fig:defer:Multiple RCU Data-Structure Versions},
+\cref{fig:defer:Multiple RCU Data-Structure Versions},
 in which a reader is traversing a linked list that is concurrently
 updated.\footnote{
 	RCU linked-list APIs may be found in
-	Section~\ref{sec:defer:RCU Linux-Kernel API}.}
+	\cref{sec:defer:RCU Linux-Kernel API}.}
 In the first row of the figure, the reader is referencing data item~A,
 and in the second row, it advances to~B, having thus far seen A followed by~B\@.
 In the third row, an updater removes element~A and in the fourth row
@@ -468,10 +468,10 @@ seeing elements~A through~E\@.
 
 Except that there was no time at which such a list existed.
 This situation might be even more surprising than that shown in
-Figure~\ref{fig:defer:Deletion With Concurrent Readers},
+\cref{fig:defer:Deletion With Concurrent Readers},
 in which different concurrent readers see different versions.
 In contrast, in
-Figure~\ref{fig:defer:Multiple RCU Data-Structure Versions}
+\cref{fig:defer:Multiple RCU Data-Structure Versions}
 the reader sees a version that never actually existed!
 
 One way to resolve this strange situation is via weaker semanitics.
@@ -492,7 +492,7 @@ the engineering life is all about choices and tradeoffs.
 Strange though this situation might seem, it is entirely consistent with
 the real world.
 As we saw in
-Section~\ref{sec:cpu:Overheads},
+\cref{sec:cpu:Overheads},
 the finite speed of light cannot be ignored within a computer system,
 and it most certainly cannot be ignored outside of this system.
 This in turn means that any data within the system representing state
@@ -511,7 +511,7 @@ In many cases, these algorithms are also perfectly capable of dealing
 with inconsistencies within the system.
 
 The pre-BSD packet routing example laid out in
-Section~\ref{sec:defer:Running Example}
+\cref{sec:defer:Running Example}
 is a case in point.
 The contents of a routing list is set by routing protocols, and these
 protocols feature significant delays (seconds or even minutes) to avoid
@@ -530,7 +530,7 @@ security policies (often set by committees of humans), storage configuration,
 and WiFi access points, to say nothing of removable hardware such as
 microphones, headsets, cameras, mice, printers, and much else besides.
 Furthermore, the large number of Linux-kernel RCU API uses shown in
-Figure~\ref{fig:defer:RCU Usage in the Linux Kernel},
+\cref{fig:defer:RCU Usage in the Linux Kernel},
 combined with the Linux kernel's heavy use of reference counting
 and with increasing use of hazard pointers in other projects, demonstrates
 that tolerance for such inconsistencies is more common than one might
@@ -545,8 +545,8 @@ RCU readers can be considered to be fully ordered with updaters, despite
 the fact that these readers might be executing the exact same sequence of
 machine instructions that would be executed by a single-threaded program.
 For example, referring back to
-Listing~\ref{lst:defer:Insertion and Deletion With Concurrent Readers}
-on page~\pageref{lst:defer:Insertion and Deletion With Concurrent Readers},
+\cref{lst:defer:Insertion and Deletion With Concurrent Readers}
+on \cpageref{lst:defer:Insertion and Deletion With Concurrent Readers},
 suppose that each reader thread invokes \co{access_route()} exactly
 once during its lifetime, and that there is no other communication among
 reader and updater threads.
@@ -579,7 +579,7 @@ versions of the list active at a given time.
 	versions of the list to be active?
 }\QuickQuizAnswerB{
 	One way of accomplishing this is as shown in
-	Listing~\ref{lst:defer:Concurrent RCU Deletion}.
+	\cref{lst:defer:Concurrent RCU Deletion}.
 
 \begin{listing}
 \begin{VerbatimL}
@@ -614,14 +614,14 @@ else {
 	\co{list_replace_rcu()} were protected by a lock, so that
 	the \co{synchronize_rcu()} was outside of that lock, similar
 	to the code shown in
-	Listing~\ref{lst:defer:Concurrent RCU Deletion}.
+	\cref{lst:defer:Concurrent RCU Deletion}.
 	Suppose further that a large number of threads undertook an
 	RCU replacement at about the same time, and that readers
 	are also constantly traversing the data structure.
 
 	Then the following sequence of events could occur, starting from
 	the end state of
-	Figure~\ref{fig:defer:Multiple RCU Data-Structure Versions}:
+	\cref{fig:defer:Multiple RCU Data-Structure Versions}:
 
 	\begin{enumerate}
 	\item	Thread~A traverses the list, obtaining a reference to
@@ -665,7 +665,7 @@ algorithms:
 \item	a publish-subscribe mechanism for adding new data,
 
 \item	a way of waiting for pre-existing RCU readers to finish
-	(see Section~\ref{sec:memorder:RCU} for more detail),
+	(see \cref{sec:memorder:RCU} for more detail),
 	and
 
 \item	a discipline of maintaining multiple versions to permit
diff --git a/defer/rcuintro.tex b/defer/rcuintro.tex
index 944f095c..1bfe6265 100644
--- a/defer/rcuintro.tex
+++ b/defer/rcuintro.tex
@@ -34,7 +34,7 @@ data structure, which consists of a single global pointer that is either
 Minimal though it might be, this data structure is heavily used in
 production~\cite{GeoffRomer2018C++DeferredReclamationP0561R4}.
 A classic approach for insertion is shown in
-Figure~\ref{fig:defer:Insertion With Concurrent Readers},
+\cref{fig:defer:Insertion With Concurrent Readers},
 which shows four states with time advancing from top to bottom.
 The first row shows the initial state, with \co{gptr} equal to \co{NULL}.
 In the second row, we have allocated a structure which is uninitialized,
@@ -46,7 +46,7 @@ reference the newly allocated and initialized element.
 We might hope that this assignment to \co{gptr} could use a simple
 C-language assignment statement.
 Unfortunately,
-Section~\ref{sec:toolsoftrade:Shared-Variable Shenanigans}
+\cref{sec:toolsoftrade:Shared-Variable Shenanigans}
 dashes these hopes.
 Therefore, the updater cannot use a simple C-language assignment, but
 must instead use \co{smp_store_release()} as shown in the figure,
@@ -56,7 +56,7 @@ Similarly, one might hope that readers could use a single C-language
 assignment to fetch the value of \co{gptr}, and be guaranteed to either
 get the old value of \co{NULL} or to get the newly installed pointer,
 but either way see a valid result.
-Unfortunately, Section~\ref{sec:toolsoftrade:Shared-Variable Shenanigans}
+Unfortunately, \cref{sec:toolsoftrade:Shared-Variable Shenanigans}
 dashes these hopes as well.
 To obtain this guarantee, readers must instead use \co{READ_ONCE()},
 or, as will be seen, \co{rcu_dereference()}.
@@ -88,10 +88,10 @@ and scalability, and also is eminently suitable for real-time use.
 Insertion is of course quite useful, but sooner or later, it will also
 be necessary to delete data.
 As can be seen in
-Figure~\ref{fig:defer:Deletion With Concurrent Readers},
+\cref{fig:defer:Deletion With Concurrent Readers},
 the first step is easy.
 Again taking the lessons from
-Section~\ref{sec:toolsoftrade:Shared-Variable Shenanigans}
+\cref{sec:toolsoftrade:Shared-Variable Shenanigans}
 to heart, \co{smp_store_release()} is used to \co{NULL} the pointer,
 thus moving from the first row to the second in the figure.
 At this point, pre-existing readers see the old structure with
@@ -102,7 +102,7 @@ the state, as indicated by the ``2 Versions'' in the figure.
 \QuickQuizSeries{%
 \QuickQuizB{
 	Why does
-	Figure~\ref{fig:defer:Deletion With Concurrent Readers}
+	\cref{fig:defer:Deletion With Concurrent Readers}
 	use \co{smp_store_release()} given that it is storing
 	a \co{NULL} pointer?
 	Wouldn't \co{WRITE_ONCE()} work just as well in this case,
@@ -128,14 +128,14 @@ the state, as indicated by the ``2 Versions'' in the figure.
 \QuickQuizE{
 	Readers running concurrently each other and with the procedure
 	outlined in
-	Figure~\ref{fig:defer:Deletion With Concurrent Readers}
+	\cref{fig:defer:Deletion With Concurrent Readers}
 	can disagree on the value of \co{gptr}.
 	Isn't that just a wee bit problematic???
 }\QuickQuizAnswerE{
 	Not necessarily.
 
-	As hinted at in Sections~\ref{sec:cpu:Hardware Optimizations}
-	and~\ref{sec:cpu:Hardware Free Lunch?},
+	As hinted at in
+	\cref{sec:cpu:Hardware Optimizations,sec:cpu:Hardware Free Lunch?},
 	speed-of-light delays mean that a computer's data is always
 	stale compared to whatever external reality that data is intended
 	to model.
@@ -145,7 +145,7 @@ the state, as indicated by the ``2 Versions'' in the figure.
 	data reflecting that reality.
 	Many of those algorithms are also able to tolerate some degree
 	of inconsistency within the in-computer data.
-	Section~\ref{sec:datastruct:RCU-Protected Hash Table Discussion}
+	\cref{sec:datastruct:RCU-Protected Hash Table Discussion}
 	discusses this point in more detail.
 
 	Please note that this need to tolerate inconsistent and stale
@@ -187,14 +187,14 @@ This question is the topic of the next section.
 \label{sec:defer:Waiting for Readers}
 
 It is tempting to base reader waiting on reference counting, but
-Figure~\ref{fig:count:Atomic Increment Scalability on x86}
+\cref{fig:count:Atomic Increment Scalability on x86}
 in
-Chapter~\ref{chp:Counting}
+\cref{chp:Counting}
 shows that concurrent reference counting results in extreme overhead,
 as we already saw in
-Section~\ref{sec:defer:Reference Counting}.
+\cref{sec:defer:Reference Counting}.
 Hazard pointers profoundly reduce this overhead, but, as we saw in
-Section~\ref{sec:defer:Hazard Pointers}, not to zero.
+\cref{sec:defer:Hazard Pointers}, not to zero.
 Nevertheless, many RCU implementations make very careful cache-local
 use of counters.
 
@@ -288,8 +288,8 @@ Within a non-preemptive operating-system kernel, for context switch to be
 a valid quiescent state, readers must be prohibited from blocking while
 referencing a given instance data structure obtained via the \co{gptr}
 pointer shown in
-Figures~\ref{fig:defer:Insertion With Concurrent Readers}
-and~\ref{fig:defer:Deletion With Concurrent Readers}.
+\cref{fig:defer:Insertion With Concurrent Readers,%
+fig:defer:Deletion With Concurrent Readers}.
 This no-blocking constraint is consistent with similar constraints
 on pure spinlocks, where a CPU is forbidden from blocking while
 holding a spinlock.
@@ -305,7 +305,7 @@ Again, this same constraint is imposed on reader threads dereferencing
 \co{gptr}: such threads are not allowed to block until after
 they are done using the pointed-to data item.
 Returning to the second row of
-Figure~\ref{fig:defer:Deletion With Concurrent Readers},
+\cref{fig:defer:Deletion With Concurrent Readers},
 where the updater has just completed executing the \co{smp_store_release()},
 imagine that CPU~0 executes a context switch.
 Because readers are not permitted to block while traversing the linked
@@ -317,7 +317,7 @@ prior readers have completed, and that there are no longer any reader
 threads referencing the newly removed data element.
 The updater can then safely free that data element, resulting in the
 state shown at the bottom of
-Figure~\ref{fig:defer:Deletion With Concurrent Readers}.
+\cref{fig:defer:Deletion With Concurrent Readers}.
 
 \begin{figure}
 \centering
@@ -329,7 +329,7 @@ Figure~\ref{fig:defer:Deletion With Concurrent Readers}.
 This approach is termed \emph{quiescent state based reclamation}
 (QSBR)~\cite{ThomasEHart2006a}.
 A QSBR schematic is shown in
-Figure~\ref{fig:defer:QSBR: Waiting for Pre-Existing Readers},
+\cref{fig:defer:QSBR: Waiting for Pre-Existing Readers},
 with time advancing from the top of the figure to the bottom.
 CPU~1 does the \co{WRITE_ONCE()} that removes the current data
 item (presumably having previously read the pointer value and
@@ -346,7 +346,7 @@ have completed, so the grace period ends, permitting CPU~1 to free
 the old data item.
 
 \QuickQuiz{
-	In Figure~\ref{fig:defer:QSBR: Waiting for Pre-Existing Readers},
+	In \cref{fig:defer:QSBR: Waiting for Pre-Existing Readers},
 	the last of CPU~3's readers that could possibly have
 	access to the old data item ended before the grace period
 	even started!
@@ -443,25 +443,25 @@ Linux-kernel environments can be as simple as defining
 and \co{preempt_enable()}, respectively.\footnote{
 	Some toy RCU implementations that handle preempted
 	read-side critical sections are shown in
-	Appendix~\ref{chp:app:``Toy'' RCU Implementations}.}
+	\cref{chp:app:``Toy'' RCU Implementations}.}
 However, this simple non-preemptible approach is conceptually complete,
 and demonstrates that it really is possible to provide read-side
 synchronization at zero cost, even in the face of concurrent updates.
 In fact,
-Listing~\ref{lst:defer:Insertion and Deletion With Concurrent Readers}
+\cref{lst:defer:Insertion and Deletion With Concurrent Readers}
 shows how reading (\co{access_route()}),
-Figure~\ref{fig:defer:Insertion With Concurrent Readers}'s
+\cref{fig:defer:Insertion With Concurrent Readers}'s
 insertion (\co{ins_route()}) and
-Figure~\ref{fig:defer:Deletion With Concurrent Readers}'s
+\cref{fig:defer:Deletion With Concurrent Readers}'s
 deletion (\co{del_route()}) can
 be implemented.
 (A slightly more capable routing table is shown in
-Section~\ref{sec:defer:RCU for Pre-BSD Routing}.)
+\cref{sec:defer:RCU for Pre-BSD Routing}.)
 
 \QuickQuizSeries{%
 \QuickQuizB{
 	What is the point of \co{rcu_read_lock()} and \co{rcu_read_unlock()} in
-	Listing~\ref{lst:defer:Insertion and Deletion With Concurrent Readers}?
+	\cref{lst:defer:Insertion and Deletion With Concurrent Readers}?
 	Why not just let the quiescent states speak for themselves?
 }\QuickQuizAnswerB{
 	Recall that readers are not permitted to pass through a quiescent
@@ -477,7 +477,7 @@ Section~\ref{sec:defer:RCU for Pre-BSD Routing}.)
 \QuickQuizE{
 	What is the point of \co{rcu_dereference()}, \co{rcu_assign_pointer()}
 	and \co{RCU_INIT_POINTER()} in
-	Listing~\ref{lst:defer:Insertion and Deletion With Concurrent Readers}?
+	\cref{lst:defer:Insertion and Deletion With Concurrent Readers}?
 	Why not just use \co{READ_ONCE()}, \co{smp_store_release()}, and
 	\co{WRITE_ONCE()}, respectively?
 }\QuickQuizAnswerE{
@@ -489,14 +489,14 @@ Section~\ref{sec:defer:RCU for Pre-BSD Routing}.)
 }
 
 Referring back to
-Listing~\ref{lst:defer:Insertion and Deletion With Concurrent Readers},
+\cref{lst:defer:Insertion and Deletion With Concurrent Readers},
 note that \co{route_lock} is used to synchronize between concurrent updaters
 invoking \co{ins_route()} and \co{del_route()}.
 However, this lock is not acquired by readers invoking \co{access_route()}:
 Readers are instead protected by the QSBR techniques described in this section.
 
 Note that \co{ins_route()} simply returns the old value of \co{gptr}, which
-Figure~\ref{fig:defer:Insertion With Concurrent Readers} assumed would
+\cref{fig:defer:Insertion With Concurrent Readers} assumed would
 always be \co{NULL}.
 This means that it is the caller's responsibility to figure out what to
 do with a non-\co{NULL} value, a task complicated by the fact that
@@ -522,14 +522,14 @@ In contrast, \co{del_route()} uses \co{synchronize_rcu()} and
 	an RCU read-side critical section?
 }\QuickQuizAnswer{
 	A \co{call_rcu()} function, which is described in
-	Section~\ref{sec:defer:Wait For Pre-Existing RCU Readers},
+	\cref{sec:defer:Wait For Pre-Existing RCU Readers},
 	permits asynchronous grace-period waits.
 }\QuickQuizEnd
 
 This example shows one general approach to reading and updating
 RCU-protected data structures, however, there is quite a variety
 of use cases, several of which are covered in
-Section~\ref{sec:defer:RCU Usage}.
+\cref{sec:defer:RCU Usage}.
 
 In summary, it is in fact possible to create concurrent linked data
 structures that can be traversed by readers executing the same sequence
@@ -550,7 +550,7 @@ and degrading scalability, but also typically prohibiting readers and
 updaters from making useful concurrent forward progress.
 
 \QuickQuiz{
-	Doesn't Section~\ref{sec:defer:Sequence Locks}'s seqlock
+	Doesn't \cref{sec:defer:Sequence Locks}'s seqlock
 	also permit readers and updaters to make useful concurrent
 	forward progress?
 }\QuickQuizAnswer{
@@ -567,13 +567,13 @@ updaters from making useful concurrent forward progress.
 	of concurrent RCU updaters.
 
 	However, both reference counters
-	(Section~\ref{sec:defer:Reference Counting})
+	(\cref{sec:defer:Reference Counting})
 	and hazard pointers
-	(Section~\ref{sec:defer:Hazard Pointers})
+	(\cref{sec:defer:Hazard Pointers})
 	really do permit useful concurrent forward progress for both
 	updaters and readers, just at somewhat greater cost.
 	Please see
-	Section~\ref{sec:defer:Which to Choose?}
+	\cref{sec:defer:Which to Choose?}
 	for a comparison of these different solutions to the
 	deferred-reclamation problem.
 }\QuickQuizEnd
@@ -611,9 +611,9 @@ RCU has been used in the Linux kernel since
 October 2002~\cite{Torvalds2.5.43}.
 Use of the RCU API has increased substantially since that time,
 as can be seen in
-Figure~\ref{fig:defer:RCU Usage in the Linux Kernel}.
+\cref{fig:defer:RCU Usage in the Linux Kernel}.
 In fact, code very similar to that in
-Listing~\ref{lst:defer:Insertion and Deletion With Concurrent Readers}
+\cref{lst:defer:Insertion and Deletion With Concurrent Readers}
 is used in the Linux kernel.
 RCU has enjoyed heavy use both prior to and since its acceptance
 in the Linux kernel, as discussed in
-- 
2.17.1