Also fix indent by white spaces. Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx> --- defer/rcuapi.tex | 90 ++++++++++---------- defer/rcuexercises.tex | 12 +-- defer/rcuintro.tex | 2 +- defer/rcurelated.tex | 10 +-- defer/rcuusage.tex | 181 ++++++++++++++++++++-------------------- defer/updates.tex | 4 +- defer/whichtochoose.tex | 18 ++-- 7 files changed, 158 insertions(+), 159 deletions(-) diff --git a/defer/rcuapi.tex b/defer/rcuapi.tex index a7de666c..1554463d 100644 --- a/defer/rcuapi.tex +++ b/defer/rcuapi.tex @@ -9,23 +9,23 @@ This section looks at RCU from the viewpoint of its Linux-kernel API\@.\footnote{ Userspace RCU's API is documented elsewhere~\cite{PaulMcKenney2013LWNURCU}.} -Section~\ref{sec:defer:RCU has a Family of Wait-to-Finish APIs} +\Cref{sec:defer:RCU has a Family of Wait-to-Finish APIs} presents RCU's wait-to-finish APIs, -Section~\ref{sec:defer:RCU has Publish-Subscribe and Version-Maintenance APIs} +\cref{sec:defer:RCU has Publish-Subscribe and Version-Maintenance APIs} presents RCU's publish-subscribe and version-maintenance APIs, -Section~\ref{sec:defer:RCU has List-Processing APIs} +\cref{sec:defer:RCU has List-Processing APIs} presents RCU's list-processing APIs, -Section~\ref{sec:defer:RCU Has Diagnostic APIs} +\cref{sec:defer:RCU Has Diagnostic APIs} presents RCU's diagnostic APIs, and -Section~\ref{sec:defer:Where Can RCU's APIs Be Used?} +\cref{sec:defer:Where Can RCU's APIs Be Used?} describes in which contexts RCU's various APIs may be used. Finally, -Section~\ref{sec:defer:So; What is RCU Really?} +\cref{sec:defer:So; What is RCU Really?} presents concluding remarks. Readers who are not excited about kernel internals may wish to skip ahead to \cref{sec:defer:RCU Usage} -on page~\pageref{sec:defer:RCU Usage}. +on \cpageref{sec:defer:RCU Usage}. \subsubsection{RCU has a Family of Wait-to-Finish APIs} \label{sec:defer:RCU has a Family of Wait-to-Finish APIs} @@ -34,11 +34,11 @@ The most straightforward answer to ``what is RCU'' is that RCU is an API\@. For example, the RCU implementation used in the Linux kernel is summarized by -Table~\ref{tab:defer:RCU Wait-to-Finish APIs}, +\cref{tab:defer:RCU Wait-to-Finish APIs}, which shows the wait-for-readers portions of the RCU, ``sleepable'' RCU (SRCU), Tasks RCU, and generic APIs, respectively, and by -Table~\ref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs}, +\cref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs}, which shows the publish-subscribe portions of the API~\cite{PaulEMcKenney2019RCUAPI}.\footnote{ This citation covers v4.20 and later. @@ -152,7 +152,7 @@ API~\cite{PaulEMcKenney2019RCUAPI}.\footnote{ If you are new to RCU, you might consider focusing on just one of the columns in -Table~\ref{tab:defer:RCU Wait-to-Finish APIs}, +\cref{tab:defer:RCU Wait-to-Finish APIs}, each of which summarizes one member of the Linux kernel's RCU API family. For example, if you are primarily interested in understanding how RCU is used in the Linux kernel, ``RCU'' would be the place to start, @@ -166,7 +166,7 @@ serve as a useful reference. \QuickQuiz{ Why do some of the cells in - Table~\ref{tab:defer:RCU Wait-to-Finish APIs} + \cref{tab:defer:RCU Wait-to-Finish APIs} have exclamation marks (``!'')? }\QuickQuizAnswer{ The API members with exclamation marks (\co{rcu_read_lock()}, @@ -243,7 +243,7 @@ The \co{rcu_barrier()} primitive does this job. Finally, RCU may be used to provide type-safe memory~\cite{Cheriton96a}, as described in -Section~\ref{sec:defer:RCU Provides Type-Safe Memory}. +\cref{sec:defer:RCU Provides Type-Safe Memory}. In the context of RCU, type-safe memory guarantees that a given data element will not change type during any RCU read-side critical section that accesses it. @@ -251,7 +251,7 @@ To make use of RCU-based type-safe memory, pass \co{SLAB_TYPESAFE_BY_RCU} to \co{kmem_cache_create()}. The ``SRCU'' column in -Table~\ref{tab:defer:RCU Wait-to-Finish APIs} +\cref{tab:defer:RCU Wait-to-Finish APIs} displays a specialized RCU API that permits general sleeping in SRCU read-side critical sections~\cite{PaulEMcKenney2006c} @@ -280,7 +280,7 @@ the expense of increased CPU overhead. \co{srcu_struct}. In practice, however, doing this is almost certainly a bad idea. In particular, the code shown in - Listing~\ref{lst:defer:Multistage SRCU Deadlocks} + \cref{lst:defer:Multistage SRCU Deadlocks} could still result in deadlock. \begin{listing} @@ -403,7 +403,7 @@ rivaling that of RCU\@. Fortunately, the RCU publish-subscribe and version-maintenance primitives shown in -Table~\ref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs} +\cref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs} apply to all of the variants of RCU discussed above. This commonality can allow more code to be shared, and reduces API proliferation. @@ -470,7 +470,7 @@ creating RCU-protected linked data structures, such as RCU-protected arrays and trees. The special case of linked lists is handled by a separate set of APIs described in -Section~\ref{sec:defer:RCU has List-Processing APIs}. +\cref{sec:defer:RCU has List-Processing APIs}. The first category publishes pointers to new data items. The \co{rcu_assign_pointer()} primitive ensures that any @@ -487,7 +487,7 @@ pointer and free the structure referenced by the old pointer. \QuickQuizB{ Normally, any pointer subject to \co{rcu_dereference()} \emph{must} always be updated using one of the pointer-publish functions in - Table~\ref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs}, + \cref{tab:defer:RCU Publish-Subscribe and Version Maintenance APIs}, for example, \co{rcu_assign_pointer()}. What is an exception to this rule? @@ -517,7 +517,7 @@ pointer and free the structure referenced by the old pointer. work out which type of RCU read-side critical section a given RCU traversal primitive corresponds to. For example, consider the code shown in - Listing~\ref{lst:defer:Diverse RCU Read-Side Nesting}. + \cref{lst:defer:Diverse RCU Read-Side Nesting}. \begin{listing} \begin{VerbatimL} @@ -635,11 +635,11 @@ Linux has four variants of doubly linked list, the circular \co{struct hlist_bl_head}/\co{struct hlist_bl_node} pairs. The former is laid out as shown in -Figure~\ref{fig:defer:Linux Circular Linked List (list)}, +\cref{fig:defer:Linux Circular Linked List (list)}, where the green (leftmost) boxes represent the list header and the blue (rightmost three) boxes represent the elements in the list. This notation is cumbersome, and will therefore be abbreviated as shown in -Figure~\ref{fig:defer:Linux Linked List Abbreviated}, +\cref{fig:defer:Linux Linked List Abbreviated}, which shows only the non-header (blue) elements. \begin{figure} @@ -656,16 +656,16 @@ Linux's \co{hlist}\footnote{ is a linear list, which means that it needs only one pointer for the header rather than the two required for the circular list, as shown in -Figure~\ref{fig:defer:Linux Linear Linked List (hlist)}. +\cref{fig:defer:Linux Linear Linked List (hlist)}. Thus, use of \co{hlist} can halve the memory consumption for the hash-bucket arrays of large hash tables. As before, this notation is cumbersome, so \co{hlist} structures will be abbreviated in the same way \co{list_head}-style lists are, as shown in -Figure~\ref{fig:defer:Linux Linked List Abbreviated}. +\cref{fig:defer:Linux Linked List Abbreviated}. A variant of Linux's \co{hlist}, named \co{hlist_nulls}, provides multiple distinct \co{NULL} pointers, but otherwise uses the same layout as shown in -Figure~\ref{fig:defer:Linux Linear Linked List (hlist)}. +\cref{fig:defer:Linux Linear Linked List (hlist)}. In this variant, a \co{->next} pointer having a zero low-order bit is considered to be a pointer. However, if the low-order bit is set to one, the upper bits identify @@ -719,7 +719,7 @@ source tree, with helpful example code provided in the Another variant of Linux's \co{hlist} incorporates bit-locking, and is named \co{hlist_bl}. This variant uses the same layout as shown in -Figure~\ref{fig:defer:Linux Linear Linked List (hlist)}, +\cref{fig:defer:Linux Linear Linked List (hlist)}, but reserves the low-order bit of the head pointer (``first'' in the figure) to lock the list. This approach also reduces memory usage, as it allows what would otherwise @@ -829,7 +829,7 @@ be a separate spinlock to be stored with the pointer itself. \end{sidewaystable*} The API members for these linked-list variants are summarized in -Table~\ref{tab:defer:RCU-Protected List APIs}. +\cref{tab:defer:RCU-Protected List APIs}. More information is available in the \path{Documentation/RCU} directory of the Linux-kernel source tree and at Linux Weekly News~\cite{PaulEMcKenney2019RCUAPI}. @@ -870,7 +870,7 @@ kfree(p); \lnlbl[kfree] \end{figure} The following discussion walks through this code, using -Figure~\ref{fig:defer:RCU Replacement in Linked List} to illustrate +\cref{fig:defer:RCU Replacement in Linked List} to illustrate the state changes. The triples in each element represent the values of fields \co{->a}, \co{->b}, and \co{->c}, respectively. @@ -889,29 +889,29 @@ with \co{5,2,3} in such a way that any given reader sees one of these two values. \begin{fcvref}[ln:defer:Canonical RCU Replacement Example (2nd)] -Line~\lnref{kmalloc} allocates a replacement element, +\Clnref{kmalloc} allocates a replacement element, resulting in the state as shown in the second row of -Figure~\ref{fig:defer:RCU Replacement in Linked List}. +\cref{fig:defer:RCU Replacement in Linked List}. At this point, no reader can hold a reference to the newly allocated element (as indicated by its green shading), and it is uninitialized (as indicated by the question marks). -Line~\lnref{copy} copies the old element to the new one, resulting in the +\Clnref{copy} copies the old element to the new one, resulting in the state as shown in the third row of -Figure~\ref{fig:defer:RCU Replacement in Linked List}. +\cref{fig:defer:RCU Replacement in Linked List}. The newly allocated element still cannot be referenced by readers, but it is now initialized. -Line~\lnref{update1} updates \co{q->b} to the value ``2'', and -line~\lnref{update2} updates \co{q->c} to the value ``3'', +\Clnref{update1} updates \co{q->b} to the value ``2'', and +\clnref{update2} updates \co{q->c} to the value ``3'', as shown on the fourth row of -Figure~\ref{fig:defer:RCU Replacement in Linked List}. +\cref{fig:defer:RCU Replacement in Linked List}. Note that the newly allocated structure is still inaccessible to readers. -Now, line~\lnref{replace} does the replacement, so that the new element is +Now, \clnref{replace} does the replacement, so that the new element is finally visible to readers, and hence is shaded red, as shown on the fifth row of -Figure~\ref{fig:defer:RCU Replacement in Linked List}. +\cref{fig:defer:RCU Replacement in Linked List}. At this point, as shown below, we have two versions of the list. Pre-existing readers might see the \co{5,6,7} element (which is therefore now shaded yellow), but @@ -919,7 +919,7 @@ new readers will instead see the \co{5,2,3} element. But any given reader is guaranteed to see one set of values or the other, not a mixture of the two. -After the \co{synchronize_rcu()} on line~\lnref{sync_rcu} returns, +After the \co{synchronize_rcu()} on \clnref{sync_rcu} returns, a grace period will have elapsed, and so all reads that started before the \co{list_replace_rcu()} will have completed. In particular, any readers that might have been holding references @@ -928,20 +928,20 @@ their RCU read-side critical sections, and are thus prohibited from continuing to hold a reference. Therefore, there can no longer be any readers holding references to the old element, as indicated its green shading in the sixth row of -Figure~\ref{fig:defer:RCU Replacement in Linked List}. +\cref{fig:defer:RCU Replacement in Linked List}. As far as the readers are concerned, we are back to having a single version of the list, but with the new element in place of the old. -After the \co{kfree()} on line~\lnref{kfree} completes, the list will +After the \co{kfree()} on \clnref{kfree} completes, the list will appear as shown on the final row of -Figure~\ref{fig:defer:RCU Replacement in Linked List}. +\cref{fig:defer:RCU Replacement in Linked List}. \end{fcvref} Despite the fact that RCU was named after the replacement case, the vast majority of RCU usage within the Linux kernel relies on the simple independent insertion and deletion, as was shown in -Figure~\ref{fig:defer:Multiple RCU Data-Structure Versions} in -Section~\ref{sec:defer:Maintain Multiple Versions of Recently Updated Objects}. +\cref{fig:defer:Multiple RCU Data-Structure Versions} in +\cref{sec:defer:Maintain Multiple Versions of Recently Updated Objects}. The next section looks at APIs that assist developers in debugging their code that makes use of RCU\@. @@ -989,7 +989,7 @@ lockdep support & \label{tab:defer:RCU Diagnostic APIs} \end{table} -Table~\ref{tab:defer:RCU Diagnostic APIs} +\Cref{tab:defer:RCU Diagnostic APIs} shows RCU's diagnostic APIs. The \co{__rcu} marks an RCU-protected pointer, for example, @@ -1081,7 +1081,7 @@ an RCU, RCU-bh, or RCU-sched read-side critical section. \label{fig:defer:RCU API Usage Constraints} \end{figure} -Figure~\ref{fig:defer:RCU API Usage Constraints} +\Cref{fig:defer:RCU API Usage Constraints} shows which APIs may be used in which in-kernel environments. The RCU read-side primitives may be used in any environment, including NMI, the RCU mutation and asynchronous grace-period primitives may be used in any @@ -1105,7 +1105,7 @@ to complete, and maintenance of multiple versions. That said, it is possible to build higher-level constructs on top of RCU, including the reader-writer-locking, reference-counting, and existence-guarantee constructs listed in -Section~\ref{sec:defer:RCU Usage}. +\cref{sec:defer:RCU Usage}. Furthermore, I have no doubt that the Linux community will continue to find interesting new uses for RCU, just as they do for any of a number of synchronization @@ -1116,7 +1116,7 @@ all of the things you can do with these APIs. However, for many people, a complete view of RCU must include sample RCU implementations. -Appendix~\ref{chp:app:``Toy'' RCU Implementations} therefore presents a series +\Cref{chp:app:``Toy'' RCU Implementations} therefore presents a series of ``toy'' RCU implementations of increasing complexity and capability, though others might prefer the classic ``User-Level Implementations of Read-Copy diff --git a/defer/rcuexercises.tex b/defer/rcuexercises.tex index 476819a9..d12d831c 100644 --- a/defer/rcuexercises.tex +++ b/defer/rcuexercises.tex @@ -15,7 +15,7 @@ suffice for most of these exercises. \EQuickQuiz{ The statistical-counter implementation shown in - Listing~\ref{lst:count:Per-Thread Statistical Counters} + \cref{lst:count:Per-Thread Statistical Counters} (\path{count_end.c}) used a global lock to guard the summation in \co{read_count()}, which resulted in poor performance and negative scalability. @@ -54,14 +54,14 @@ suffice for most of these exercises. Why or why not? See - Section~\ref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters} + \cref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters} on - page~\pageref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters} + \cpageref{sec:together:RCU and Per-Thread-Variable-Based Statistical Counters} for more details. }\EQuickQuizEnd \EQuickQuiz{ - Section~\ref{sec:count:Applying Exact Limit Counters} + \Cref{sec:count:Applying Exact Limit Counters} showed a fanciful pair of code fragments that dealt with counting I/O accesses to removable devices. These code fragments suffered from high overhead on the fastpath @@ -78,8 +78,8 @@ suffice for most of these exercises. device-removal code fragment to suit. See - Section~\ref{sec:together:RCU and Counters for Removable I/O Devices} + \cref{sec:together:RCU and Counters for Removable I/O Devices} on - Page~\pageref{sec:together:RCU and Counters for Removable I/O Devices} + \cpageref{sec:together:RCU and Counters for Removable I/O Devices} for one solution to this problem. }\EQuickQuizEnd diff --git a/defer/rcuintro.tex b/defer/rcuintro.tex index 1bfe6265..a3c9cc58 100644 --- a/defer/rcuintro.tex +++ b/defer/rcuintro.tex @@ -145,7 +145,7 @@ the state, as indicated by the ``2 Versions'' in the figure. data reflecting that reality. Many of those algorithms are also able to tolerate some degree of inconsistency within the in-computer data. - \cref{sec:datastruct:RCU-Protected Hash Table Discussion} + \Cref{sec:datastruct:RCU-Protected Hash Table Discussion} discusses this point in more detail. Please note that this need to tolerate inconsistent and stale diff --git a/defer/rcurelated.tex b/defer/rcurelated.tex index 4c27ed8f..9061b99d 100644 --- a/defer/rcurelated.tex +++ b/defer/rcurelated.tex @@ -35,11 +35,11 @@ As of 2021, Linux-kernel RCU is still under active development. However, in the mid 2010s, there was a welcome upsurge in RCU research and development across a number of communities and institutions~\cite{FransKaashoek2015ParallelOSHistory}. -Section~\ref{sec:defer:RCU Uses} describes uses of RCU, -Section~\ref{sec:defer:RCU Implementations} describes RCU implementations +\Cref{sec:defer:RCU Uses} describes uses of RCU, +\cref{sec:defer:RCU Implementations} describes RCU implementations (as well as work that both creates and uses an implementation), and finally, -Section~\ref{sec:defer:RCU Validation} describes verification and validation +\cref{sec:defer:RCU Validation} describes verification and validation of RCU and its uses. \subsubsection{RCU Uses} @@ -145,7 +145,7 @@ pressed scalable non-zero indicators (SNZI)~\cite{FaithEllen:2007:SNZI} into service as a grace-period mechanism. The intended use is to implement software transactional memory -(see Section~\ref{sec:future:Transactional Memory}), which +(see \cref{sec:future:Transactional Memory}), which imposes linearizability requirements, which in turn seems to limit scalability. @@ -225,7 +225,7 @@ This paper also made some interesting performance-evaluation choices that are discussed further in \cref{sec:future:Deferred Reclamation} on -page~\ref{sec:future:Deferred Reclamation}. +\cpageref{sec:future:Deferred Reclamation}. \ppl{Adam}{Belay} et al.~created an RCU implementation that guards the data structures used by TCP/IP's address-resolution protocol (ARP) diff --git a/defer/rcuusage.tex b/defer/rcuusage.tex index 6586b27f..b84e32a4 100644 --- a/defer/rcuusage.tex +++ b/defer/rcuusage.tex @@ -40,15 +40,15 @@ This section answers the question ``What is RCU?'' from the viewpoint of the uses to which RCU can be put. Because RCU is most frequently used to replace some existing mechanism, we look at it primarily in terms of its relationship to such mechanisms, -as listed in Table~\ref{tab:defer:RCU Usage}. +as listed in \cref{tab:defer:RCU Usage}. Following the sections listed in this table, -Section~\ref{sec:defer:RCU Usage Summary} provides a summary. +\cref{sec:defer:RCU Usage Summary} provides a summary. \subsubsection{RCU for Pre-BSD Routing} \label{sec:defer:RCU for Pre-BSD Routing} -Listings~\ref{lst:defer:RCU Pre-BSD Routing Table Lookup} -and~\ref{lst:defer:RCU Pre-BSD Routing Table Add/Delete} +\Cref{lst:defer:RCU Pre-BSD Routing Table Lookup,% +lst:defer:RCU Pre-BSD Routing Table Add/Delete} show code for an RCU-protected Pre-BSD routing table (\path{route_rcu.c}). The former shows data structures and \co{route_lookup()}, @@ -67,19 +67,19 @@ and the latter shows \co{route_add()} and \co{route_del()}. \end{listing} \begin{fcvref}[ln:defer:route_rcu:lookup] -In Listing~\ref{lst:defer:RCU Pre-BSD Routing Table Lookup}, -line~\lnref{rh} adds the \co{->rh} field used by RCU reclamation, -line~\lnref{re_freed} adds the \co{->re_freed} use-after-free-check field, -lines~\lnref{lock}, \lnref{unlock1}, and~\lnref{unlock2} +In \cref{lst:defer:RCU Pre-BSD Routing Table Lookup}, +\clnref{rh} adds the \co{->rh} field used by RCU reclamation, +\clnref{re_freed} adds the \co{->re_freed} use-after-free-check field, +\clnref{lock,unlock1,unlock2} add RCU read-side protection, -and lines~\lnref{chk_freed} and~\lnref{abort} add the use-after-free check. +and \clnref{chk_freed,abort} add the use-after-free check. \end{fcvref} \begin{fcvref}[ln:defer:route_rcu:add_del] -In Listing~\ref{lst:defer:RCU Pre-BSD Routing Table Add/Delete}, -lines~\lnref{add:lock}, \lnref{add:unlock}, \lnref{del:lock}, -\lnref{del:unlock1}, and~\lnref{del:unlock2} add update-side locking, -lines~\lnref{add:add_rcu} and~\lnref{del:del_rcu} add RCU update-side protection, -line~\lnref{del:call_rcu} causes \co{route_cb()} to be invoked after +In \cref{lst:defer:RCU Pre-BSD Routing Table Add/Delete}, +\clnref{add:lock,add:unlock,del:lock,% +del:unlock1,del:unlock2} add update-side locking, +\clnref{add:add_rcu,del:del_rcu} add RCU update-side protection, +\clnref{del:call_rcu} causes \co{route_cb()} to be invoked after a grace period elapses, and \clnrefrange{cb:b}{cb:e} define \co{route_cb()}. This is minimal added code for a working concurrent implementation. @@ -92,7 +92,7 @@ This is minimal added code for a working concurrent implementation. \label{fig:defer:Pre-BSD Routing Table Protected by RCU} \end{figure} -Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by RCU} +\Cref{fig:defer:Pre-BSD Routing Table Protected by RCU} shows the performance on the read-only workload. RCU scales quite well, and offers nearly ideal performance. However, this data was generated using the \co{RCU_SIGNAL} @@ -102,13 +102,13 @@ for which \co{rcu_read_lock()} and \co{rcu_read_unlock()} generate a small amount of code. What happens for the QSBR flavor of RCU, which generates no code at all for \co{rcu_read_lock()} and \co{rcu_read_unlock()}? -(See Section~\ref{sec:defer:Introduction to RCU}, +(See \cref{sec:defer:Introduction to RCU}, and especially -Figure~\ref{fig:defer:QSBR: Waiting for Pre-Existing Readers}, +\cref{fig:defer:QSBR: Waiting for Pre-Existing Readers}, for a discussion of RCU QSBR\@.) The answer to this is shown in -Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR}, +\cref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR}, which shows that RCU QSBR's performance and scalability actually exceeds that of the ideal synchronization-free workload. @@ -152,7 +152,7 @@ that of the ideal synchronization-free workload. on modern microprocessors. Such results can be thought of as similar to the celebrated super-linear speedups (see - Section~\ref{sec:SMPdesign:Beyond Partitioning} + \cref{sec:SMPdesign:Beyond Partitioning} for one such example), that is, of interest but also of limited practical importance. Nevertheless, one of the strengths of RCU is that its read-side @@ -171,7 +171,7 @@ that of the ideal synchronization-free workload. \co{->re_next.next} pointer also had zero offset, just the same as the sequential variant. And the answer, as can be seen in - Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR With Non-Initial rcu-head}, + \cref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR With Non-Initial rcu-head}, is that this causes RCU QSBR's performance to decrease to where it is still very nearly ideal, but no longer super-ideal. }\QuickQuizEndB @@ -240,7 +240,7 @@ These advantages and limitations are discussed in the following sections. The read-side performance advantages of Linux-kernel RCU over reader-writer locking are shown in -Figure~\ref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}, +\cref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}, which was generated on a 448-CPU 2.10\,GHz Intel x86 system. \QuickQuizSeries{% @@ -328,7 +328,7 @@ which was generated on a 448-CPU 2.10\,GHz Intel x86 system. Of course, it is also the case that the older results were obtained on a different system than were those shown in - Figure~\ref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}. + \cref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}. So which change had the most effect, Linus's commit or the change in the system? This question is left as an exercise to the reader. @@ -336,7 +336,7 @@ which was generated on a 448-CPU 2.10\,GHz Intel x86 system. \QuickQuizE{ Why is there such large variation for the \co{rcu} trace in - Figure~\ref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}? + \cref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}? }\QuickQuizAnswerE{ Keep in mind that this is a log-log plot, so those large-seeming \co{rcu} variances in reality span only a few hundred picoseconds. @@ -371,7 +371,7 @@ from 30~runs, with the line being the median. A more moderate view may be obtained from a \co{CONFIG_PREEMPT} kernel, though RCU still beats reader-writer locking by between a factor of seven on a single CPU and by three orders of magnitude on 192~CPUs, as shown in -Figure~\ref{fig:defer:Performance Advantage of Preemptible RCU Over Reader-Writer Locking}, +\cref{fig:defer:Performance Advantage of Preemptible RCU Over Reader-Writer Locking}, which was generated on the same 448-CPU 2.10\,GHz x86 system. Note the high variability of reader-writer locking at larger numbers of CPUs. The error bars span the full range of data. @@ -403,7 +403,7 @@ is exaggerated by the unrealistic zero-length critical sections. The performance advantages of RCU decrease as the overhead of the critical sections increase. This decrease can be seen in -Figure~\ref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration}, +\cref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration}, which was run on the same system as the previous plots. Here, the y-axis represents the sum of the overhead of the read-side primitives and that of the critical section and the x-axis represents @@ -413,11 +413,11 @@ separations between the traces still represent significant differences. This figure shows non-preemptible RCU, but given that preemptible RCU's read-side overhead is only about three nanoseconds, its plot would be nearly identical to -Figure~\ref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration}. +\cref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration}. \QuickQuiz{ Why the larger error ranges for the submicrosecond durations in - Figure~\ref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration}? + \cref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of Critical-Section Duration}? }\QuickQuizAnswer{ Because smaller disturbances result in greater relative errors for smaller measurements. @@ -425,7 +425,7 @@ Figure~\ref{fig:defer:Comparison of RCU to Reader-Writer Locking as Function of is (as of 2020) less accurate than is the \co{udelay()} primitive used for the data for durations of a microsecond or more. It is instructive to compare to the zero-length case shown in - Figure~\ref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}. + \cref{fig:defer:Performance Advantage of RCU Over Reader-Writer Locking}. }\QuickQuizEnd There are three traces for reader-writer locking, with the upper trace @@ -570,7 +570,7 @@ RCU readers to finish, the RCU readers might well see the change more quickly than would batch-fair reader-writer-locking readers, as shown in -Figure~\ref{fig:defer:Response Time of RCU vs. Reader-Writer Locking}. +\cref{fig:defer:Response Time of RCU vs. Reader-Writer Locking}. Once the update is received, the rwlock writer cannot proceed until the last reader completes, and subsequent readers cannot proceed until the @@ -608,7 +608,7 @@ Fortunately, there are a number of approaches that avoid inconsistency and stale data~\cite{PaulEdwardMcKenneyPhD,Arcangeli03}, and some methods based on reference counting are discussed in -Section~\ref{sec:defer:Reference Counting}. +\cref{sec:defer:Reference Counting}. \paragraph{Low-Priority RCU Readers Can Block High-Priority Reclaimers} @@ -629,7 +629,7 @@ With the exception of userspace RCU~\cite{MathieuDesnoyers2009URCU,PaulMcKenney2013LWNURCU}, expedited grace periods, and several of the ``toy'' RCU implementations described in -Appendix~\ref{chp:app:``Toy'' RCU Implementations}, +\cref{chp:app:``Toy'' RCU Implementations}, RCU grace periods extend milliseconds. Although there are a number of techniques to render such long delays harmless, including use of the asynchronous interfaces where available @@ -641,10 +641,9 @@ situations. In the best case, the conversion from reader-writer locking to RCU is quite simple, as shown in -Listings~\ref{lst:defer:Converting Reader-Writer Locking to RCU: Data}, -\ref{lst:defer:Converting Reader-Writer Locking to RCU: Search}, -and -\ref{lst:defer:Converting Reader-Writer Locking to RCU: Deletion}, +\cref{lst:defer:Converting Reader-Writer Locking to RCU: Data,% +lst:defer:Converting Reader-Writer Locking to RCU: Search,% +lst:defer:Converting Reader-Writer Locking to RCU: Deletion}, all taken from Wikipedia~\cite{WikipediaRCU}. @@ -849,7 +848,7 @@ waits for any previously acquired references to be released. Of course, RCU can also be combined with traditional reference counting, as discussed in -Section~\ref{sec:together:Refurbish Reference Counting}. +\cref{sec:together:Refurbish Reference Counting}. \begin{figure} \centering @@ -867,8 +866,8 @@ Section~\ref{sec:together:Refurbish Reference Counting}. But why bother? Again, part of the answer is performance, as shown in -Figures~\ref{fig:defer:Performance of RCU vs. Reference Counting} -and~\ref{fig:defer:Performance of Preemptible RCU vs. Reference Counting}, +\cref{fig:defer:Performance of RCU vs. Reference Counting,% +fig:defer:Performance of Preemptible RCU vs. Reference Counting}, again showing data taken on a 448-CPU 2.1\,GHz Intel x86 system for non-preemptible and preemptible Linux-kernel RCU, respectively. Non-preemptible RCU's advantage over reference counting ranges from @@ -887,7 +886,7 @@ one CPU up to about three orders of magnitude at 192~CPUs. However, as with reader-writer locking, the performance advantages of RCU are most pronounced for short-duration critical sections and for large numbers of CPUs, as shown in -Figure~\ref{fig:defer:Response Time of RCU vs. Reference Counting} +\cref{fig:defer:Response Time of RCU vs. Reference Counting} for the same system. In addition, as with reader-writer locking, many system calls (and thus any RCU read-side critical sections that they contain) complete in @@ -983,7 +982,7 @@ Gamsa et al.~\cite{Gamsa99} discuss existence guarantees and describe how a mechanism resembling RCU can be used to provide these existence guarantees (see Section~5 on page 7 of the PDF), and -Section~\ref{sec:locking:Lock-Based Existence Guarantees} +\cref{sec:locking:Lock-Based Existence Guarantees} discusses how to guarantee existence via locking, along with the ensuing disadvantages of doing so. The effect is that if any RCU-protected data element is accessed @@ -1026,27 +1025,27 @@ int delete(int key) \end{listing} \begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking] -Listing~\ref{lst:defer:Existence Guarantees Enable Per-Element Locking} +\Cref{lst:defer:Existence Guarantees Enable Per-Element Locking} demonstrates how RCU-based existence guarantees can enable per-element locking via a function that deletes an element from a hash table. -Line~\lnref{hash} computes a hash function, and line~\lnref{rdlock} enters an RCU +\Clnref{hash} computes a hash function, and \clnref{rdlock} enters an RCU read-side critical section. -If line~\lnref{chkkey} finds that the corresponding bucket of the hash table is +If \clnref{chkkey} finds that the corresponding bucket of the hash table is empty or that the element present is not the one we wish to delete, -then line~\lnref{rdunlock1} exits the RCU read-side critical section and -line~\lnref{ret_0:a} +then \clnref{rdunlock1} exits the RCU read-side critical section and +\clnref{ret_0:a} indicates failure. \end{fcvref} \QuickQuiz{ What if the element we need to delete is not the first element of the list on - line~\ref{ln:defer:Existence Guarantees Enable Per-Element Locking:chkkey} of - Listing~\ref{lst:defer:Existence Guarantees Enable Per-Element Locking}? + \clnrefr{ln:defer:Existence Guarantees Enable Per-Element Locking:chkkey} of + \cref{lst:defer:Existence Guarantees Enable Per-Element Locking}? }\QuickQuizAnswer{ As with - Listing~\ref{lst:locking:Per-Element Locking Without Existence Guarantees}, + \cref{lst:locking:Per-Element Locking Without Existence Guarantees}, this is a very simple hash table with no chaining, so the only element in a given bucket is the first element. The reader is again invited to adapt this example to a hash table with @@ -1056,29 +1055,29 @@ indicates failure. }\QuickQuizEnd \begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking] -Otherwise, line~\lnref{acq} acquires the update-side spinlock, and -line~\lnref{chkkey2} then checks that the element is still the one that we want. -If so, line~\lnref{rdunlock2} leaves the RCU read-side critical section, -line~\lnref{remove} removes it from the table, line~\lnref{rel1} releases -the lock, line~\lnref{sync_rcu} waits for all pre-existing RCU read-side critical -sections to complete, line~\lnref{kfree} frees the newly removed element, -and line~\lnref{ret_1} indicates success. -If the element is no longer the one we want, line~\lnref{rel2} releases -the lock, line~\lnref{rdunlock3} leaves the RCU read-side critical section, -and line~\lnref{ret_0:b} indicates failure to delete the specified key. +Otherwise, \clnref{acq} acquires the update-side spinlock, and +\clnref{chkkey2} then checks that the element is still the one that we want. +If so, \clnref{rdunlock2} leaves the RCU read-side critical section, +\clnref{remove} removes it from the table, \clnref{rel1} releases +the lock, \clnref{sync_rcu} waits for all pre-existing RCU read-side critical +sections to complete, \clnref{kfree} frees the newly removed element, +and \clnref{ret_1} indicates success. +If the element is no longer the one we want, \clnref{rel2} releases +the lock, \clnref{rdunlock3} leaves the RCU read-side critical section, +and \clnref{ret_0:b} indicates failure to delete the specified key. \end{fcvref} \QuickQuizSeries{% \QuickQuizB{ \begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking] Why is it OK to exit the RCU read-side critical section on - line~\lnref{rdunlock2} of - Listing~\ref{lst:defer:Existence Guarantees Enable Per-Element Locking} - before releasing the lock on line~\lnref{rel1}? + \clnref{rdunlock2} of + \cref{lst:defer:Existence Guarantees Enable Per-Element Locking} + before releasing the lock on \clnref{rel1}? \end{fcvref} }\QuickQuizAnswerB{ \begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking] - First, please note that the second check on line~\lnref{chkkey2} is + First, please note that the second check on \clnref{chkkey2} is necessary because some other CPU might have removed this element while we were waiting to acquire the lock. @@ -1101,9 +1100,9 @@ and line~\lnref{ret_0:b} indicates failure to delete the specified key. \QuickQuizE{ \begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking] Why not exit the RCU read-side critical section on - line~\lnref{rdunlock3} of - Listing~\ref{lst:defer:Existence Guarantees Enable Per-Element Locking} - before releasing the lock on line~\lnref{rel2}? + \clnref{rdunlock3} of + \cref{lst:defer:Existence Guarantees Enable Per-Element Locking} + before releasing the lock on \clnref{rel2}? \end{fcvref} }\QuickQuizAnswerE{ Suppose we reverse the order of these two lines. @@ -1112,24 +1111,24 @@ and line~\lnref{ret_0:b} indicates failure to delete the specified key. \begin{enumerate} \begin{fcvref}[ln:defer:Existence Guarantees Enable Per-Element Locking] \item CPU~0 invokes \co{delete()}, and finds the element - to be deleted, executing through line~\lnref{rdunlock2}. + to be deleted, executing through \clnref{rdunlock2}. It has not yet actually deleted the element, but is about to do so. \item CPU~1 concurrently invokes \co{delete()}, attempting to delete this same element. However, CPU~0 still holds the lock, so CPU~1 waits - for it at line~\lnref{acq}. - \item CPU~0 executes lines~\lnref{remove} and~\lnref{rel1}, - and blocks at line~\lnref{sync_rcu} waiting for CPU~1 + for it at \clnref{acq}. + \item CPU~0 executes \clnref{remove,rel1}, + and blocks at \clnref{sync_rcu} waiting for CPU~1 to exit its RCU read-side critical section. - \item CPU~1 now acquires the lock, but the test on line~\lnref{chkkey2} + \item CPU~1 now acquires the lock, but the test on \clnref{chkkey2} fails because CPU~0 has already removed the element. - CPU~1 now executes line~\lnref{rel2} - (which we switched with line~\lnref{rdunlock3} + CPU~1 now executes \clnref{rel2} + (which we switched with \clnref{rdunlock3} for the purposes of this Quick Quiz) and exits its RCU read-side critical section. \item CPU~0 can now return from \co{synchronize_rcu()}, - and thus executes line~\lnref{kfree}, sending the element to + and thus executes \clnref{kfree}, sending the element to the freelist. \item CPU~1 now attempts to release a lock for an element that has been freed, and, worse yet, possibly @@ -1143,10 +1142,10 @@ and line~\lnref{ret_0:b} indicates failure to delete the specified key. Alert readers will recognize this as only a slight variation on the original ``RCU is a way of waiting for things to finish'' theme, which is addressed in -Section~\ref{sec:defer:RCU is a Way of Waiting for Things to Finish}. +\cref{sec:defer:RCU is a Way of Waiting for Things to Finish}. They might also note the deadlock-immunity advantages over the lock-based existence guarantees discussed in -Section~\ref{sec:locking:Lock-Based Existence Guarantees}. +\cref{sec:locking:Lock-Based Existence Guarantees}. \subsubsection{RCU Provides Type-Safe Memory} \label{sec:defer:RCU Provides Type-Safe Memory} @@ -1183,7 +1182,7 @@ for the duration of any pre-existing RCU read-side critical sections. during which at least one thread is always in an RCU read-side critical section. However, the key words in the description in - Section~\ref{sec:defer:RCU Provides Type-Safe Memory} + \cref{sec:defer:RCU Provides Type-Safe Memory} are ``in-use'' and ``pre-existing''. Keep in mind that a given RCU read-side critical section is conceptually only permitted to gain references to data elements @@ -1234,7 +1233,7 @@ Simpler is after all almost always better! \subsubsection{RCU is a Way of Waiting for Things to Finish} \label{sec:defer:RCU is a Way of Waiting for Things to Finish} -As noted in Section~\ref{sec:defer:RCU Fundamentals} +As noted in \cref{sec:defer:RCU Fundamentals} an important component of RCU is a way of waiting for RCU readers to finish. One of @@ -1273,7 +1272,7 @@ In this example, the \co{timer_stop} function uses \co{synchronize_sched()} to ensure that all in-flight NMI notifications have completed before freeing the associated resources. A simplified version of this code is shown -Listing~\ref{lst:defer:Using RCU to Wait for NMIs to Finish}. +\cref{lst:defer:Using RCU to Wait for NMIs to Finish}. \begin{listing} \begin{fcvlabel}[ln:defer:Using RCU to Wait for NMIs to Finish] @@ -1314,7 +1313,7 @@ void nmi_stop(void) \lnlbl@nmi_stop:b$ \begin{fcvref}[ln:defer:Using RCU to Wait for NMIs to Finish:struct] \Clnrefrange{b}{e} define a \co{profile_buffer} structure, containing a size and an indefinite array of entries. -Line~\lnref{buf} defines a pointer to a profile buffer, which is +\Clnref{buf} defines a pointer to a profile buffer, which is presumably initialized elsewhere to point to a dynamically allocated region of memory. \end{fcvref} @@ -1326,14 +1325,14 @@ As such, it cannot be preempted, nor can it be interrupted by a normal interrupts handler, however, it is still subject to delays due to cache misses, ECC errors, and cycle stealing by other hardware threads within the same core. -Line~\lnref{rcu_deref} gets a local pointer to the profile buffer using the +\Clnref{rcu_deref} gets a local pointer to the profile buffer using the \co{rcu_dereference()} primitive to ensure memory ordering on DEC Alpha, and -lines~\lnref{if_NULL} and~\lnref{ret:a} exit from this function if there is no -profile buffer currently allocated, while lines~\lnref{if_oor} and~\lnref{ret:b} +\clnref{if_NULL,ret:a} exit from this function if there is no +profile buffer currently allocated, while \clnref{if_oor,ret:b} exit from this function if the \co{pcvalue} argument is out of range. -Otherwise, line~\lnref{inc} increments the profile-buffer entry indexed +Otherwise, \clnref{inc} increments the profile-buffer entry indexed by the \co{pcvalue} argument. Note that storing the size with the buffer guarantees that the range check matches the buffer, even if a large buffer is suddenly @@ -1344,15 +1343,15 @@ replaced by a smaller one. \Clnrefrange{b}{e} define the \co{nmi_stop()} function, where the caller is responsible for mutual exclusion (for example, holding the correct lock). -Line~\lnref{fetch} fetches a pointer to the profile buffer, and -lines~\lnref{if_NULL} and~\lnref{ret} exit the function if there is no buffer. -Otherwise, line~\lnref{NULL} \co{NULL}s out the profile-buffer pointer +\Clnref{fetch} fetches a pointer to the profile buffer, and +\clnref{if_NULL,ret} exit the function if there is no buffer. +Otherwise, \clnref{NULL} \co{NULL}s out the profile-buffer pointer (using the \co{rcu_assign_pointer()} primitive to maintain memory ordering on weakly ordered machines), -and line~\lnref{sync_sched} waits for an RCU Sched grace period to elapse, +and \clnref{sync_sched} waits for an RCU Sched grace period to elapse, in particular, waiting for all non-preemptible regions of code, including NMI handlers, to complete. -Once execution continues at line~\lnref{kfree}, we are guaranteed that +Once execution continues at \clnref{kfree}, we are guaranteed that any instance of \co{nmi_profile()} that obtained a pointer to the old buffer has returned. It is therefore safe to free the buffer, in this case using the @@ -1368,7 +1367,7 @@ It is therefore safe to free the buffer, in this case using the in \co{nmi_profile()}, and to replace the \co{synchronize_sched()} with \co{synchronize_rcu()}, perhaps as shown in - Listing~\ref{lst:defer:Using RCU to Wait for Mythical Preemptible NMIs to Finish}. + \cref{lst:defer:Using RCU to Wait for Mythical Preemptible NMIs to Finish}. % \begin{listing} \begin{VerbatimL} @@ -1446,7 +1445,7 @@ as well as for any of a number of other synchronization primitives. \end{figure} In the meantime, -Figure~\ref{fig:defer:RCU Areas of Applicability} +\cref{fig:defer:RCU Areas of Applicability} shows some rough rules of thumb on where RCU is most helpful. As shown in the blue box at the top of the figure, RCU works best if @@ -1500,7 +1499,7 @@ update-mostly workloads requiring consistent data are rarely good places to use RCU, though there are some exceptions~\cite{MathieuDesnoyers2012URCU}. In addition, as noted in -Section~\ref{sec:defer:RCU Provides Type-Safe Memory}, +\cref{sec:defer:RCU Provides Type-Safe Memory}, within the Linux kernel, the \co{SLAB_TYPESAFE_BY_RCU} slab-allocator flag provides type-safe memory to RCU readers, which can greatly simplify \IXacrl{nbs} and other lockless diff --git a/defer/updates.tex b/defer/updates.tex index 6e9ea0de..8b231011 100644 --- a/defer/updates.tex +++ b/defer/updates.tex @@ -17,7 +17,7 @@ scalability for writers. We have already seen one situation featuring high performance and scalability for writers, namely the counting algorithms surveyed in -Chapter~\ref{chp:Counting}. +\cref{chp:Counting}. These algorithms featured partially partitioned data structures so that updates can operate locally, while the more-expensive reads must sum across the entire data structure. @@ -35,7 +35,7 @@ of the garbage collector. And of course, where feasible, fully partitioned or ``sharded'' systems provide excellent performance and scalability, as noted in -Chapter~\ref{cha:Partitioning and Synchronization Design}. +\cref{cha:Partitioning and Synchronization Design}. The next chapter will look at updates in the context of several types of data structures. diff --git a/defer/whichtochoose.tex b/defer/whichtochoose.tex index 8f2921b4..3940c4bc 100644 --- a/defer/whichtochoose.tex +++ b/defer/whichtochoose.tex @@ -9,9 +9,9 @@ may be; custom will soon render it easy and agreeable.} {\emph{Pythagoras}} -Section~\ref{sec:defer:Which to Choose? (Overview)} +\Cref{sec:defer:Which to Choose? (Overview)} provides a high-level overview and then -Section~\ref{sec:defer:Which to Choose? (Details)} +\cref{sec:defer:Which to Choose? (Details)} provides a more detailed view of the differences between the deferred-processing techniques presented in this chapter. @@ -19,7 +19,7 @@ This discussion assumes a linked data structure that is large enough that readers do not hold references from one traversal to another, and where elements might be added to and removed from the structure at any location and at any time. -Section~\ref{sec:defer:Which to Choose? (Production Use)} +\Cref{sec:defer:Which to Choose? (Production Use)} then points out a few publicly visible production uses of hazard pointers, sequence locking, and RCU\@. This discussion should help you to make an informed choice between @@ -74,12 +74,12 @@ these techniques. \label{tab:defer:Which Deferred Technique to Choose? (Overview)} \end{table*} -Table~\ref{tab:defer:Which Deferred Technique to Choose? (Overview)} +\Cref{tab:defer:Which Deferred Technique to Choose? (Overview)} shows a few high-level properties that distinguish the deferred-reclamation techniques from one another. The ``Readers'' row summarizes the results presented in -Figure~\ref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR}, +\cref{fig:defer:Pre-BSD Routing Table Protected by RCU QSBR}, which shows that all but reference counting are enjoy reasonably fast and scalable readers. @@ -135,7 +135,7 @@ run concurrently with any update. Yes they do, but the guarantee only applies unconditionally in cases where a reference is already held. With this in mind, please review the paragraph at the beginning of - Section~\ref{sec:defer:Which to Choose?}, especially the part + \cref{sec:defer:Which to Choose?}, especially the part saying ``large enough that readers do not hold references from one traversal to another''. }\QuickQuizEnd @@ -252,7 +252,7 @@ But those wishing more detail should continue on to the next section. \label{tab:defer:Which Deferred Technique to Choose? (Details)} \end{table*} -Table~\ref{tab:defer:Which Deferred Technique to Choose? (Details)} +\Cref{tab:defer:Which Deferred Technique to Choose? (Details)} provides more-detailed rules of thumb that can help you choose among the four deferred-processing techniques presented in this chapter. @@ -293,7 +293,7 @@ be reduced by batching, so that each read-side operation covers more data. \QuickQuiz{ But didn't the answer to one of the quick quizzes in - Section~\ref{sec:defer:Hazard Pointers} + \cref{sec:defer:Hazard Pointers} say that pairwise asymmetric barriers could eliminate the read-side \co{smp_mb()} from hazard pointers? }\QuickQuizAnswer{ @@ -380,7 +380,7 @@ counting is used. This means that the complexity of reference-acquisition failure only needs to be dealt with for those few data elements: The bulk of the reference acquisitions are unconditional, courtesy of RCU\@. -See Section~\ref{sec:together:Refurbish Reference Counting} +See \cref{sec:together:Refurbish Reference Counting} for more information on combining reference counting with other synchronization mechanisms. -- 2.17.1