>From d3f43d97fdbcbe68b6e5936b8d94c50b78bfeaaf Mon Sep 17 00:00:00 2001 From: Akira Yokosawa <akiyks@xxxxxxxxx> Date: Mon, 10 Oct 2017 22:18:17 +0900 Subject: [PATCH] Convert code snippets to 'listing' env (howto, toolsoftrade, count) Also fix a typo referring table by "Figure" in count.tex. Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx> --- Hi Paul, Looks like we have some time to do the conversion to "listing" environments before the upcoming release. This is the 1st round of the conversion. Thanks, Akira -- count/count.tex | 276 +++++++++++++++++++++--------------------- defer/rcuexercises.tex | 2 +- howto/howto.tex | 20 +-- owned/owned.tex | 28 ++--- together/applyrcu.tex | 16 +-- toolsoftrade/toolsoftrade.tex | 140 ++++++++++----------- 6 files changed, 241 insertions(+), 241 deletions(-) diff --git a/count/count.tex b/count/count.tex index a213558..9638971 100644 --- a/count/count.tex +++ b/count/count.tex @@ -165,7 +165,7 @@ are more appropriate for advanced students. \section{Why Isn't Concurrent Counting Trivial?} \label{sec:count:Why Isn't Concurrent Counting Trivial?} -\begin{figure}[bp] +\begin{listing}[bp] { \scriptsize \begin{verbbox} 1 long counter = 0; @@ -184,12 +184,12 @@ are more appropriate for advanced students. \centering \theverbbox \caption{Just Count!} -\label{fig:count:Just Count!} -\end{figure} +\label{lst:count:Just Count!} +\end{listing} Let's start with something simple, for example, the straightforward use of arithmetic shown in -Figure~\ref{fig:count:Just Count!} (\path{count_nonatomic.c}). +Listing~\ref{lst:count:Just Count!} (\path{count_nonatomic.c}). Here, we have a counter on line~1, we increment it on line~5, and we read out its value on line~10. What could be simpler? @@ -242,7 +242,7 @@ accuracies far greater than 50\,\% are almost always necessary. Furthermore, proofs can have bugs just as easily as programs can! } \QuickQuizEnd -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 atomic_t counter = ATOMIC_INIT(0); @@ -261,8 +261,8 @@ accuracies far greater than 50\,\% are almost always necessary. \centering \theverbbox \caption{Just Count Atomically!} -\label{fig:count:Just Count Atomically!} -\end{figure} +\label{lst:count:Just Count Atomically!} +\end{listing} \begin{figure}[tb] \centering @@ -273,7 +273,7 @@ accuracies far greater than 50\,\% are almost always necessary. The straightforward way to count accurately is to use atomic operations, as shown in -Figure~\ref{fig:count:Just Count Atomically!} (\path{count_atomic.c}). +Listing~\ref{lst:count:Just Count Atomically!} (\path{count_atomic.c}). Line~1 defines an atomic variable, line~5 atomically increments it, and line~10 reads it out. Because this is atomic, it keeps perfect count. @@ -491,7 +491,7 @@ thread (presumably cache aligned and padded to avoid false sharing). Section~\ref{sec:count:Per-Thread-Variable-Based Implementation}. } \QuickQuizEnd -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 DEFINE_PER_THREAD(long, counter); @@ -515,11 +515,11 @@ thread (presumably cache aligned and padded to avoid false sharing). \centering \theverbbox \caption{Array-Based Per-Thread Statistical Counters} -\label{fig:count:Array-Based Per-Thread Statistical Counters} -\end{figure} +\label{lst:count:Array-Based Per-Thread Statistical Counters} +\end{listing} Such an array can be wrapped into per-thread primitives, as shown in -Figure~\ref{fig:count:Array-Based Per-Thread Statistical Counters} +Listing~\ref{lst:count:Array-Based Per-Thread Statistical Counters} (\path{count_stat.c}). Line~1 defines an array containing a set of per-thread counters of type \co{long} named, creatively enough, \co{counter}. @@ -559,7 +559,7 @@ normal loads suffice, and no special atomic instructions are required. \QuickQuiz{} How does the per-thread \co{counter} variable in - Figure~\ref{fig:count:Array-Based Per-Thread Statistical Counters} + Listing~\ref{lst:count:Array-Based Per-Thread Statistical Counters} get initialized? \QuickQuizAnswer{ The C standard specifies that the initial value of @@ -573,7 +573,7 @@ normal loads suffice, and no special atomic instructions are required. \QuickQuiz{} How is the code in - Figure~\ref{fig:count:Array-Based Per-Thread Statistical Counters} + Listing~\ref{lst:count:Array-Based Per-Thread Statistical Counters} supposed to permit more than one counter? \QuickQuizAnswer{ Indeed, this toy example does not support more than one counter. @@ -602,7 +602,7 @@ at the beginning of this chapter. and during that time, the counter could well be changing. This means that the value returned by \co{read_count()} in - Figure~\ref{fig:count:Array-Based Per-Thread Statistical Counters} + Listing~\ref{lst:count:Array-Based Per-Thread Statistical Counters} will not necessarily be exact. Assume that the counter is being incremented at rate $r$ counts per unit time, and that \co{read_count()}'s @@ -743,7 +743,7 @@ date, however, once updates cease, the global counter will eventually converge on the true value---hence this approach qualifies as eventually consistent. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 DEFINE_PER_THREAD(unsigned long, counter); @@ -802,11 +802,11 @@ eventually consistent. \centering \theverbbox \caption{Array-Based Per-Thread Eventually Consistent Counters} -\label{fig:count:Array-Based Per-Thread Eventually Consistent Counters} -\end{figure} +\label{lst:count:Array-Based Per-Thread Eventually Consistent Counters} +\end{listing} The implementation is shown in -Figure~\ref{fig:count:Array-Based Per-Thread Eventually Consistent Counters} +Listing~\ref{lst:count:Array-Based Per-Thread Eventually Consistent Counters} (\path{count_stat_eventual.c}). Lines~1-2 show the per-thread variable and the global variable that track the counter's value, and line three shows \co{stopflag} @@ -814,7 +814,7 @@ which is used to coordinate termination (for the case where we want to terminate the program with an accurate counter value). The \co{inc_count()} function shown on lines~5-9 is similar to its counterpart in -Figure~\ref{fig:count:Array-Based Per-Thread Statistical Counters}. +Listing~\ref{lst:count:Array-Based Per-Thread Statistical Counters}. The \co{read_count()} function shown on lines~11-14 simply returns the value of the \co{global_count} variable. @@ -834,7 +834,7 @@ comes at the cost of the additional thread running \co{eventual()}. \QuickQuiz{} Why doesn't \co{inc_count()} in - Figure~\ref{fig:count:Array-Based Per-Thread Eventually Consistent Counters} + Listing~\ref{lst:count:Array-Based Per-Thread Eventually Consistent Counters} need to use atomic instructions? After all, we now have multiple threads accessing the per-thread counters! @@ -872,7 +872,7 @@ comes at the cost of the additional thread running \co{eventual()}. \QuickQuiz{} Won't the single global thread in the function \co{eventual()} of - Figure~\ref{fig:count:Array-Based Per-Thread Eventually Consistent Counters} + Listing~\ref{lst:count:Array-Based Per-Thread Eventually Consistent Counters} be just as severe a bottleneck as a global lock would be? \QuickQuizAnswer{ In this case, no. @@ -883,7 +883,7 @@ comes at the cost of the additional thread running \co{eventual()}. \QuickQuiz{} Won't the estimate returned by \co{read_count()} in - Figure~\ref{fig:count:Array-Based Per-Thread Eventually Consistent Counters} + Listing~\ref{lst:count:Array-Based Per-Thread Eventually Consistent Counters} become increasingly inaccurate as the number of threads rises? \QuickQuizAnswer{ @@ -897,7 +897,7 @@ comes at the cost of the additional thread running \co{eventual()}. \QuickQuiz{} Given that in the eventually\-/consistent algorithm shown in - Figure~\ref{fig:count:Array-Based Per-Thread Eventually Consistent Counters} + Listing~\ref{lst:count:Array-Based Per-Thread Eventually Consistent Counters} both reads and updates have extremely low overhead and are extremely scalable, why would anyone bother with the implementation described in @@ -934,7 +934,7 @@ comes at the cost of the additional thread running \co{eventual()}. \subsection{Per-Thread-Variable-Based Implementation} \label{sec:count:Per-Thread-Variable-Based Implementation} -\begin{figure}[tb] +\begin{listing}[tb] { \scriptsize \begin{verbbox} 1 long __thread counter = 0; @@ -984,13 +984,13 @@ comes at the cost of the additional thread running \co{eventual()}. \centering \theverbbox \caption{Per-Thread Statistical Counters} -\label{fig:count:Per-Thread Statistical Counters} -\end{figure} +\label{lst:count:Per-Thread Statistical Counters} +\end{listing} Fortunately, \GCC\ provides an \co{__thread} storage class that provides per-thread storage. This can be used as shown in -Figure~\ref{fig:count:Per-Thread Statistical Counters} (\path{count_end.c}) +Listing~\ref{lst:count:Per-Thread Statistical Counters} (\path{count_end.c}) to implement a statistical counter that not only scales, but that also incurs little or no performance penalty to incrementers compared to simple non-atomic @@ -1058,7 +1058,7 @@ Finally, line~22 returns the sum. \QuickQuiz{} Doesn't the check for \co{NULL} on line~19 of - Figure~\ref{fig:count:Per-Thread Statistical Counters} + Listing~\ref{lst:count:Per-Thread Statistical Counters} add extra branch mispredictions? Why not have a variable set permanently to zero, and point unused counter-pointers to that variable rather than setting @@ -1074,7 +1074,7 @@ Finally, line~22 returns the sum. \QuickQuiz{} Why on earth do we need something as heavyweight as a \emph{lock} guarding the summation in the function \co{read_count()} in - Figure~\ref{fig:count:Per-Thread Statistical Counters}? + Listing~\ref{lst:count:Per-Thread Statistical Counters}? \QuickQuizAnswer{ Remember, when a thread exits, its per-thread variables disappear. Therefore, if we attempt to access a given thread's per-thread @@ -1105,7 +1105,7 @@ array to point to its per-thread \co{counter} variable. \QuickQuiz{} Why on earth do we need to acquire the lock in \co{count_register_thread()} in - Figure~\ref{fig:count:Per-Thread Statistical Counters}? + Listing~\ref{lst:count:Per-Thread Statistical Counters}? It is a single properly aligned machine-word store to a location that no other thread is modifying, so it should be atomic anyway, right? @@ -1150,7 +1150,7 @@ variables vanish when that thread exits. accessible, even if the corresponding CPU is offline---even if the corresponding CPU never existed and never will exist. -\begin{figure}[tb] +\begin{listing}[tb] { \scriptsize \begin{verbbox} 1 long __thread counter = 0; @@ -1196,12 +1196,12 @@ variables vanish when that thread exits. \centering \theverbbox \caption{Per-Thread Statistical Counters With Lockless Summation} -\label{fig:count:Per-Thread Statistical Counters With Lockless Summation} -\end{figure} +\label{lst:count:Per-Thread Statistical Counters With Lockless Summation} +\end{listing} One workaround is to ensure that each thread continues to exist until all threads are finished, as shown in - Figure~\ref{fig:count:Per-Thread Statistical Counters With Lockless Summation} + Listing~\ref{lst:count:Per-Thread Statistical Counters With Lockless Summation} (\path{count_tstat.c}). Analysis of this code is left as an exercise to the reader, however, please note that it does not fit well into the @@ -1388,7 +1388,7 @@ Section~\ref{sec:SMPdesign:Parallel Fastpath}. \subsection{Simple Limit Counter Implementation} \label{sec:count:Simple Limit Counter Implementation} -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 unsigned long __thread counter = 0; @@ -1403,8 +1403,8 @@ Section~\ref{sec:SMPdesign:Parallel Fastpath}. \centering \theverbbox \caption{Simple Limit Counter Variables} -\label{fig:count:Simple Limit Counter Variables} -\end{figure} +\label{lst:count:Simple Limit Counter Variables} +\end{listing} \begin{figure}[tb] \centering @@ -1413,7 +1413,7 @@ Section~\ref{sec:SMPdesign:Parallel Fastpath}. \label{fig:count:Simple Limit Counter Variable Relationships} \end{figure} -Figure~\ref{fig:count:Simple Limit Counter Variables} +Listing~\ref{lst:count:Simple Limit Counter Variables} shows both the per-thread and global variables used by this implementation. The per-thread \co{counter} and \co{countermax} variables are the @@ -1443,7 +1443,7 @@ spinlock guards all of the global variables, in other words, no thread is permitted to access or modify any of the global variables unless it has acquired \co{gblcnt_mutex}. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 int add_count(unsigned long delta) @@ -1501,17 +1501,17 @@ has acquired \co{gblcnt_mutex}. \centering \theverbbox \caption{Simple Limit Counter Add, Subtract, and Read} -\label{fig:count:Simple Limit Counter Add, Subtract, and Read} -\end{figure} +\label{lst:count:Simple Limit Counter Add, Subtract, and Read} +\end{listing} -Figure~\ref{fig:count:Simple Limit Counter Add, Subtract, and Read} +Listing~\ref{lst:count:Simple Limit Counter Add, Subtract, and Read} shows the \co{add_count()}, \co{sub_count()}, and \co{read_count()} functions (\path{count_lim.c}). \QuickQuiz{} Why does - Figure~\ref{fig:count:Simple Limit Counter Add, Subtract, and Read} + Listing~\ref{lst:count:Simple Limit Counter Add, Subtract, and Read} provide \co{add_count()} and \co{sub_count()} instead of the \co{inc_count()} and \co{dec_count()} interfaces show in Section~\ref{sec:count:Statistical Counters}? @@ -1531,7 +1531,7 @@ references only per-thread variables, and should not incur any cache misses. \QuickQuiz{} What is with the strange form of the condition on line~3 of - Figure~\ref{fig:count:Simple Limit Counter Add, Subtract, and Read}? + Listing~\ref{lst:count:Simple Limit Counter Add, Subtract, and Read}? Why not the following more intuitive form of the fastpath? \vspace{5pt} @@ -1552,7 +1552,7 @@ references only per-thread variables, and should not incur any cache misses. Try the above formulation with \co{counter} equal to 10 and \co{delta} equal to \co{ULONG_MAX}. Then try it again with the code shown in - Figure~\ref{fig:count:Simple Limit Counter Add, Subtract, and Read}. + Listing~\ref{lst:count:Simple Limit Counter Add, Subtract, and Read}. A good understanding of integer overflow will be required for the rest of this example, so if you have never dealt with @@ -1566,7 +1566,7 @@ If the test on line~3 fails, we must access global variables, and thus must acquire \co{gblcnt_mutex} on line~7, which we release on line~11 in the failure case or on line~16 in the success case. Line~8 invokes \co{globalize_count()}, shown in -Figure~\ref{fig:count:Simple Limit Counter Utility Functions}, +Listing~\ref{lst:count:Simple Limit Counter Utility Functions}, which clears the thread-local variables, adjusting the global variables as needed, thus simplifying global processing. (But don't take \emph{my} word for it, try coding it yourself!) @@ -1581,7 +1581,7 @@ returns indicating failure. Otherwise, we take the slowpath. Line~14 adds \co{delta} to \co{globalcount}, and then line~15 invokes \co{balance_count()} (shown in -Figure~\ref{fig:count:Simple Limit Counter Utility Functions}) +Listing~\ref{lst:count:Simple Limit Counter Utility Functions}) in order to update both the global and the per-thread variables. This call to \co{balance_count()} will usually set this thread's \co{countermax} to re-enable the fastpath. @@ -1592,7 +1592,7 @@ line~17 returns indicating success. \QuickQuiz{} Why does \co{globalize_count()} zero the per-thread variables, only to later call \co{balance_count()} to refill them in - Figure~\ref{fig:count:Simple Limit Counter Add, Subtract, and Read}? + Listing~\ref{lst:count:Simple Limit Counter Add, Subtract, and Read}? Why not just leave the per-thread variables non-zero? \QuickQuizAnswer{ That is in fact what an earlier version of this code did. @@ -1616,7 +1616,7 @@ Because the slowpath must access global state, line~26 acquires \co{gblcnt_mutex}, which is released either by line~29 (in case of failure) or by line~34 (in case of success). Line~27 invokes \co{globalize_count()}, shown in -Figure~\ref{fig:count:Simple Limit Counter Utility Functions}, +Listing~\ref{lst:count:Simple Limit Counter Utility Functions}, which again clears the thread-local variables, adjusting the global variables as needed. Line~28 checks to see if the counter can accommodate subtracting @@ -1626,7 +1626,7 @@ Line~28 checks to see if the counter can accommodate subtracting \QuickQuiz{} Given that \co{globalreserve} counted against us in \co{add_count()}, why doesn't it count for us in \co{sub_count()} in - Figure~\ref{fig:count:Simple Limit Counter Add, Subtract, and Read}? + Listing~\ref{lst:count:Simple Limit Counter Add, Subtract, and Read}? \QuickQuizAnswer{ The \co{globalreserve} variable tracks the sum of all threads' \co{countermax} variables. @@ -1641,7 +1641,7 @@ Line~28 checks to see if the counter can accommodate subtracting \QuickQuiz{} Suppose that one thread invokes \co{add_count()} shown in - Figure~\ref{fig:count:Simple Limit Counter Add, Subtract, and Read}, + Listing~\ref{lst:count:Simple Limit Counter Add, Subtract, and Read}, and then another thread invokes \co{sub_count()}. Won't \co{sub_count()} return failure even though the value of the counter is non-zero? @@ -1658,14 +1658,14 @@ If, on the other hand, line~28 finds that the counter \emph{can} accommodate subtracting \co{delta}, we complete the slowpath. Line~32 does the subtraction and then line~33 invokes \co{balance_count()} (shown in -Figure~\ref{fig:count:Simple Limit Counter Utility Functions}) +Listing~\ref{lst:count:Simple Limit Counter Utility Functions}) in order to update both global and per-thread variables (hopefully re-enabling the fastpath). Then line~34 releases \co{gblcnt_mutex}, and line~35 returns success. \QuickQuiz{} Why have both \co{add_count()} and \co{sub_count()} in - Figure~\ref{fig:count:Simple Limit Counter Add, Subtract, and Read}? + Listing~\ref{lst:count:Simple Limit Counter Add, Subtract, and Read}? Why not simply pass a negative number to \co{add_count()}? \QuickQuizAnswer{ Given that \co{add_count()} takes an \co{unsigned} \co{long} @@ -1686,7 +1686,7 @@ Line~44 initializes local variable \co{sum} to the value of per-thread \co{counter} variables. Line~49 then returns the sum. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 static void globalize_count(void) @@ -1732,13 +1732,13 @@ Line~49 then returns the sum. \centering \theverbbox \caption{Simple Limit Counter Utility Functions} -\label{fig:count:Simple Limit Counter Utility Functions} -\end{figure} +\label{lst:count:Simple Limit Counter Utility Functions} +\end{listing} -Figure~\ref{fig:count:Simple Limit Counter Utility Functions} +Listing~\ref{lst:count:Simple Limit Counter Utility Functions} shows a number of utility functions used by the \co{add_count()}, \co{sub_count()}, and \co{read_count()} primitives shown in -Figure~\ref{fig:count:Simple Limit Counter Add, Subtract, and Read}. +Listing~\ref{lst:count:Simple Limit Counter Add, Subtract, and Read}. Lines~1-7 show \co{globalize_count()}, which zeros the current thread's per-thread counters, adjusting the global variables appropriately. @@ -1782,7 +1782,7 @@ Finally, in either case, line~18 makes the corresponding adjustment to \QuickQuiz{} Why set \co{counter} to \co{countermax / 2} in line~15 of - Figure~\ref{fig:count:Simple Limit Counter Utility Functions}? + Listing~\ref{lst:count:Simple Limit Counter Utility Functions}? Wouldn't it be simpler to just take \co{countermax} counts? \QuickQuizAnswer{ First, it really is reserving \co{countermax} counts @@ -1902,7 +1902,7 @@ This task is undertaken in the next section. \subsection{Approximate Limit Counter Implementation} \label{sec:count:Approximate Limit Counter Implementation} -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 unsigned long __thread counter = 0; @@ -1918,10 +1918,10 @@ This task is undertaken in the next section. \centering \theverbbox \caption{Approximate Limit Counter Variables} -\label{fig:count:Approximate Limit Counter Variables} -\end{figure} +\label{lst:count:Approximate Limit Counter Variables} +\end{listing} -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 static void balance_count(void) @@ -1942,25 +1942,25 @@ This task is undertaken in the next section. \centering \theverbbox \caption{Approximate Limit Counter Balancing} -\label{fig:count:Approximate Limit Counter Balancing} -\end{figure} +\label{lst:count:Approximate Limit Counter Balancing} +\end{listing} Because this implementation (\path{count_lim_app.c}) is quite similar to that in the previous section -(Figures~\ref{fig:count:Simple Limit Counter Variables}, -\ref{fig:count:Simple Limit Counter Add, Subtract, and Read}, and -\ref{fig:count:Simple Limit Counter Utility Functions}), +(Listings~\ref{lst:count:Simple Limit Counter Variables}, +\ref{lst:count:Simple Limit Counter Add, Subtract, and Read}, and +\ref{lst:count:Simple Limit Counter Utility Functions}), only the changes are shown here. -Figure~\ref{fig:count:Approximate Limit Counter Variables} +Listing~\ref{lst:count:Approximate Limit Counter Variables} is identical to -Figure~\ref{fig:count:Simple Limit Counter Variables}, +Listing~\ref{lst:count:Simple Limit Counter Variables}, with the addition of \co{MAX_COUNTERMAX}, which sets the maximum permissible value of the per-thread \co{countermax} variable. Similarly, -Figure~\ref{fig:count:Approximate Limit Counter Balancing} +Listing~\ref{lst:count:Approximate Limit Counter Balancing} is identical to the \co{balance_count()} function in -Figure~\ref{fig:count:Simple Limit Counter Utility Functions}, +Listing~\ref{lst:count:Simple Limit Counter Utility Functions}, with the addition of lines~6 and 7, which enforce the \co{MAX_COUNTERMAX} limit on the per-thread \co{countermax} variable. @@ -2038,7 +2038,7 @@ represent \co{counter} and the low-order 16 bits to represent the reader. } \QuickQuizEnd -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 atomic_t __thread ctrandmax = ATOMIC_INIT(0); @@ -2079,12 +2079,12 @@ represent \co{counter} and the low-order 16 bits to represent \centering \theverbbox \caption{Atomic Limit Counter Variables and Access Functions} -\label{fig:count:Atomic Limit Counter Variables and Access Functions} -\end{figure} +\label{lst:count:Atomic Limit Counter Variables and Access Functions} +\end{listing} The variables and access functions for a simple atomic limit counter are shown in -Figure~\ref{fig:count:Atomic Limit Counter Variables and Access Functions} +Listing~\ref{lst:count:Atomic Limit Counter Variables and Access Functions} (\path{count_lim_atomic.c}). The \co{counter} and \co{countermax} variables in earlier algorithms are combined into the single variable \co{ctrandmax} shown on @@ -2096,7 +2096,7 @@ representation of \co{int}. Lines~2-6 show the definitions for \co{globalcountmax}, \co{globalcount}, \co{globalreserve}, \co{counterp}, and \co{gblcnt_mutex}, all of which take on roles similar to their counterparts in -Figure~\ref{fig:count:Approximate Limit Counter Variables}. +Listing~\ref{lst:count:Approximate Limit Counter Variables}. Line~7 defines \co{CM_BITS}, which gives the number of bits in each half of \co{ctrandmax}, and line~8 defines \co{MAX_COUNTERMAX}, which gives the maximum value that may be held in either half of @@ -2104,7 +2104,7 @@ gives the maximum value that may be held in either half of \QuickQuiz{} In what way does line~7 of - Figure~\ref{fig:count:Atomic Limit Counter Variables and Access Functions} + Listing~\ref{lst:count:Atomic Limit Counter Variables and Access Functions} violate the C standard? \QuickQuizAnswer{ It assumes eight bits per byte. @@ -2134,7 +2134,7 @@ it on line~24. \QuickQuiz{} Given that there is only one \co{ctrandmax} variable, why bother passing in a pointer to it on line~18 of - Figure~\ref{fig:count:Atomic Limit Counter Variables and Access Functions}? + Listing~\ref{lst:count:Atomic Limit Counter Variables and Access Functions}? \QuickQuizAnswer{ There is only one \co{ctrandmax} variable \emph{per thread}. Later, we will see code that needs to pass other threads' @@ -2149,7 +2149,7 @@ the result. \QuickQuiz{} Why does \co{merge_ctrandmax()} in - Figure~\ref{fig:count:Atomic Limit Counter Variables and Access Functions} + Listing~\ref{lst:count:Atomic Limit Counter Variables and Access Functions} return an \co{int} rather than storing directly into an \co{atomic_t}? \QuickQuizAnswer{ @@ -2157,7 +2157,7 @@ the result. to the \co{atomic_cmpxchg()} primitive. } \QuickQuizEnd -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 int add_count(unsigned long delta) @@ -2228,10 +2228,10 @@ the result. \centering \theverbbox \caption{Atomic Limit Counter Add and Subtract} -\label{fig:count:Atomic Limit Counter Add and Subtract} -\end{figure} +\label{lst:count:Atomic Limit Counter Add and Subtract} +\end{listing} -Figure~\ref{fig:count:Atomic Limit Counter Add and Subtract} +Listing~\ref{lst:count:Atomic Limit Counter Add and Subtract} shows the \co{add_count()} and \co{sub_count()} functions. Lines~1-32 show \co{add_count()}, whose fastpath spans lines~8-15, @@ -2256,7 +2256,7 @@ execution continues in the loop at line~9. \QuickQuiz{} Yecch! Why the ugly \co{goto} on line~11 of - Figure~\ref{fig:count:Atomic Limit Counter Add and Subtract}? + Listing~\ref{lst:count:Atomic Limit Counter Add and Subtract}? Haven't you heard of the \co{break} statement??? \QuickQuizAnswer{ Replacing the \co{goto} with a \co{break} would require keeping @@ -2271,20 +2271,20 @@ execution continues in the loop at line~9. \QuickQuiz{} Why would the \co{atomic_cmpxchg()} primitive at lines~13-14 of - Figure~\ref{fig:count:Atomic Limit Counter Add and Subtract} + Listing~\ref{lst:count:Atomic Limit Counter Add and Subtract} ever fail? After all, we picked up its old value on line~9 and have not changed it! \QuickQuizAnswer{ Later, we will see how the \co{flush_local_count()} function in - Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1} + Listing~\ref{lst:count:Atomic Limit Counter Utility Functions 1} might update this thread's \co{ctrandmax} variable concurrently with the execution of the fastpath on lines~8-14 of - Figure~\ref{fig:count:Atomic Limit Counter Add and Subtract}. + Listing~\ref{lst:count:Atomic Limit Counter Add and Subtract}. } \QuickQuizEnd Lines~16-31 of -Figure~\ref{fig:count:Atomic Limit Counter Add and Subtract} +Listing~\ref{lst:count:Atomic Limit Counter Add and Subtract} show \co{add_count()}'s slowpath, which is protected by \co{gblcnt_mutex}, which is acquired on line~17 and released on lines~24 and 30. Line~18 invokes \co{globalize_count()}, which moves this thread's @@ -2304,14 +2304,14 @@ spreads counts to the local state if appropriate, line~30 releases returns success. Lines~34-63 of -Figure~\ref{fig:count:Atomic Limit Counter Add and Subtract} +Listing~\ref{lst:count:Atomic Limit Counter Add and Subtract} show \co{sub_count()}, which is structured similarly to \co{add_count()}, having a fastpath on lines~41-48 and a slowpath on lines~49-62. A line-by-line analysis of this function is left as an exercise to the reader. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 unsigned long read_count(void) @@ -2337,10 +2337,10 @@ the reader. \centering \theverbbox \caption{Atomic Limit Counter Read} -\label{fig:count:Atomic Limit Counter Read} -\end{figure} +\label{lst:count:Atomic Limit Counter Read} +\end{listing} -Figure~\ref{fig:count:Atomic Limit Counter Read} shows \co{read_count()}. +Listing~\ref{lst:count:Atomic Limit Counter Read} shows \co{read_count()}. Line~9 acquires \co{gblcnt_mutex} and line~16 releases it. Line~10 initializes local variable \co{sum} to the value of \co{globalcount}, and the loop spanning lines~11-15 adds the @@ -2348,7 +2348,7 @@ per-thread counters to this sum, isolating each per-thread counter using \co{split_ctrandmax} on line~13. Finally, line~17 returns the sum. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 static void globalize_count(void) @@ -2388,10 +2388,10 @@ Finally, line~17 returns the sum. \centering \theverbbox \caption{Atomic Limit Counter Utility Functions 1} -\label{fig:count:Atomic Limit Counter Utility Functions 1} -\end{figure} +\label{lst:count:Atomic Limit Counter Utility Functions 1} +\end{listing} -\begin{figure}[tb] +\begin{listing}[tb] { \scriptsize \begin{verbbox} 1 static void balance_count(void) @@ -2440,11 +2440,11 @@ Finally, line~17 returns the sum. \centering \theverbbox \caption{Atomic Limit Counter Utility Functions 2} -\label{fig:count:Atomic Limit Counter Utility Functions 2} -\end{figure} +\label{lst:count:Atomic Limit Counter Utility Functions 2} +\end{listing} -Figures~\ref{fig:count:Atomic Limit Counter Utility Functions 1} -and~\ref{fig:count:Atomic Limit Counter Utility Functions 2} +Listings~\ref{lst:count:Atomic Limit Counter Utility Functions 1} +and~\ref{lst:count:Atomic Limit Counter Utility Functions 2} shows the utility functions \co{globalize_count()}, \co{flush_local_count()}, @@ -2452,7 +2452,7 @@ shows the utility functions \co{count_register_thread()}, and \co{count_unregister_thread()}. The code for \co{globalize_count()} is shown on lines~1-12, -of Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1} and +of Listing~\ref{lst:count:Atomic Limit Counter Utility Functions 1} and is similar to that of previous algorithms, with the addition of line~7, which is now required to split out \co{counter} and \co{countermax} from \co{ctrandmax}. @@ -2477,7 +2477,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}. What stops a thread from simply refilling its \co{ctrandmax} variable immediately after \co{flush_local_count()} on line~14 of - Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1} + Listing~\ref{lst:count:Atomic Limit Counter Utility Functions 1} empties it? \QuickQuizAnswer{ This other thread cannot refill its \co{ctrandmax} @@ -2494,7 +2494,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}. \co{add_count()} or \co{sub_count()} from interfering with the \co{ctrandmax} variable while \co{flush_local_count()} is accessing it on line~27 of - Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 1} + Listing~\ref{lst:count:Atomic Limit Counter Utility Functions 1} empties it? \QuickQuizAnswer{ Nothing. @@ -2520,7 +2520,7 @@ line~30 subtracts this thread's \co{countermax} from \co{globalreserve}. } \QuickQuizEnd Lines~1-22 on -Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 2} +Listing~\ref{lst:count:Atomic Limit Counter Utility Functions 2} show the code for \co{balance_count()}, which refills the calling thread's local \co{ctrandmax} variable. This function is quite similar to that of the preceding algorithms, @@ -2534,7 +2534,7 @@ line~33. Given that the \co{atomic_set()} primitive does a simple store to the specified \co{atomic_t}, how can line~21 of \co{balance_count()} in - Figure~\ref{fig:count:Atomic Limit Counter Utility Functions 2} + Listing~\ref{lst:count:Atomic Limit Counter Utility Functions 2} work correctly in face of concurrent \co{flush_local_count()} updates to this variable? \QuickQuizAnswer{ @@ -2668,7 +2668,7 @@ The slowpath then sets that thread's \co{theft} state to IDLE. \subsection{Signal-Theft Limit Counter Implementation} \label{sec:count:Signal-Theft Limit Counter Implementation} -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 #define THEFT_IDLE 0 @@ -2693,10 +2693,10 @@ The slowpath then sets that thread's \co{theft} state to IDLE. \centering \theverbbox \caption{Signal-Theft Limit Counter Data} -\label{fig:count:Signal-Theft Limit Counter Data} -\end{figure} +\label{lst:count:Signal-Theft Limit Counter Data} +\end{listing} -Figure~\ref{fig:count:Signal-Theft Limit Counter Data} +Listing~\ref{lst:count:Signal-Theft Limit Counter Data} (\path{count_lim_sig.c}) shows the data structures used by the signal-theft based counter implementation. @@ -2706,7 +2706,7 @@ Lines~8-17 are similar to earlier implementations, with the addition of lines~14 and 15 to allow remote access to a thread's \co{countermax} and \co{theft} variables, respectively. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 static void globalize_count(void) @@ -2777,10 +2777,10 @@ and \co{theft} variables, respectively. \centering \theverbbox \caption{Signal-Theft Limit Counter Value-Migration Functions} -\label{fig:count:Signal-Theft Limit Counter Value-Migration Functions} -\end{figure} +\label{lst:count:Signal-Theft Limit Counter Value-Migration Functions} +\end{listing} -Figure~\ref{fig:count:Signal-Theft Limit Counter Value-Migration Functions} +Listing~\ref{lst:count:Signal-Theft Limit Counter Value-Migration Functions} shows the functions responsible for migrating counts between per-thread variables and the global variables. Lines~1-7 shows \co{globalize_count()}, which is identical to earlier @@ -2796,7 +2796,7 @@ this thread's fastpaths are not running, line~16 sets the \co{theft} state to READY. \QuickQuiz{} - In Figure~\ref{fig:count:Signal-Theft Limit Counter Value-Migration Functions} + In Listing~\ref{lst:count:Signal-Theft Limit Counter Value-Migration Functions} function \co{flush_local_count_sig()}, why are there \co{ACCESS_ONCE()} wrappers around the uses of the \co{theft} per-thread variable? @@ -2835,7 +2835,7 @@ Otherwise, line~32 sets the thread's \co{theft} state to REQ and line~33 sends the thread a signal. \QuickQuiz{} - In Figure~\ref{fig:count:Signal-Theft Limit Counter Value-Migration Functions}, + In Listing~\ref{lst:count:Signal-Theft Limit Counter Value-Migration Functions}, why is it safe for line~28 to directly access the other thread's \co{countermax} variable? \QuickQuizAnswer{ @@ -2849,7 +2849,7 @@ line~33 sends the thread a signal. } \QuickQuizEnd \QuickQuiz{} - In Figure~\ref{fig:count:Signal-Theft Limit Counter Value-Migration Functions}, + In Listing~\ref{lst:count:Signal-Theft Limit Counter Value-Migration Functions}, why doesn't line~33 check for the current thread sending itself a signal? \QuickQuizAnswer{ @@ -2861,7 +2861,7 @@ line~33 sends the thread a signal. \QuickQuiz{} The code in - Figure~\ref{fig:count:Signal-Theft Limit Counter Value-Migration Functions}, + Listing~\ref{lst:count:Signal-Theft Limit Counter Value-Migration Functions}, works with \GCC\ and POSIX. What would be required to make it also conform to the ISO C standard? \QuickQuizAnswer{ @@ -2882,7 +2882,7 @@ READY, so lines~43-46 do the thieving. Line~47 then sets the thread's \co{theft} state back to IDLE. \QuickQuiz{} - In Figure~\ref{fig:count:Signal-Theft Limit Counter Value-Migration Functions}, why does line~41 resend the signal? + In Listing~\ref{lst:count:Signal-Theft Limit Counter Value-Migration Functions}, why does line~41 resend the signal? \QuickQuizAnswer{ Because many operating systems over several decades have had the property of losing the occasional signal. @@ -2897,7 +2897,7 @@ Line~47 then sets the thread's \co{theft} state back to IDLE. Lines~51-63 show \co{balance_count()}, which is similar to that of earlier examples. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 int add_count(unsigned long delta) @@ -2941,10 +2941,10 @@ earlier examples. \centering \theverbbox \caption{Signal-Theft Limit Counter Add Function} -\label{fig:count:Signal-Theft Limit Counter Add Function} -\end{figure} +\label{lst:count:Signal-Theft Limit Counter Add Function} +\end{listing} -\begin{figure}[tb] +\begin{listing}[tb] { \scriptsize \begin{verbbox} 38 int sub_count(unsigned long delta) @@ -2986,10 +2986,10 @@ earlier examples. \centering \theverbbox \caption{Signal-Theft Limit Counter Subtract Function} -\label{fig:count:Signal-Theft Limit Counter Subtract Function} -\end{figure} +\label{lst:count:Signal-Theft Limit Counter Subtract Function} +\end{listing} -Figure~\ref{fig:count:Signal-Theft Limit Counter Add Function} +Listing~\ref{lst:count:Signal-Theft Limit Counter Add Function} shows the \co{add_count()} function. The fastpath spans lines~5-20, and the slowpath lines~21-35. Line~5 sets the per-thread \co{counting} variable to 1 so that @@ -3014,7 +3014,7 @@ READY also sees the effects of line~9. If the fastpath addition at line~9 was executed, then line~20 returns success. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 unsigned long read_count(void) @@ -3035,21 +3035,21 @@ success. \centering \theverbbox \caption{Signal-Theft Limit Counter Read Function} -\label{fig:count:Signal-Theft Limit Counter Read Function} -\end{figure} +\label{lst:count:Signal-Theft Limit Counter Read Function} +\end{listing} Otherwise, we fall through to the slowpath starting at line~21. The structure of the slowpath is similar to those of earlier examples, so its analysis is left as an exercise to the reader. Similarly, the structure of \co{sub_count()} on -Figure~\ref{fig:count:Signal-Theft Limit Counter Subtract Function} +Listing~\ref{lst:count:Signal-Theft Limit Counter Subtract Function} is the same as that of \co{add_count()}, so the analysis of \co{sub_count()} is also left as an exercise for the reader, as is the analysis of \co{read_count()} in -Figure~\ref{fig:count:Signal-Theft Limit Counter Read Function}. +Listing~\ref{lst:count:Signal-Theft Limit Counter Read Function}. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 void count_init(void) @@ -3092,11 +3092,11 @@ Figure~\ref{fig:count:Signal-Theft Limit Counter Read Function}. \centering \theverbbox \caption{Signal-Theft Limit Counter Initialization Functions} -\label{fig:count:Signal-Theft Limit Counter Initialization Functions} -\end{figure} +\label{lst:count:Signal-Theft Limit Counter Initialization Functions} +\end{listing} Lines~1-12 of -Figure~\ref{fig:count:Signal-Theft Limit Counter Initialization Functions} +Listing~\ref{lst:count:Signal-Theft Limit Counter Initialization Functions} show \co{count_init()}, which set up \co{flush_local_count_sig()} as the signal handler for \co{SIGUSR1}, enabling the \co{pthread_kill()} calls in \co{flush_local_count()} @@ -3414,7 +3414,7 @@ courtesy of eventual consistency. \label{tab:count:Limit Counter Performance on Power-6} \end{table*} -Figure~\ref{tab:count:Limit Counter Performance on Power-6} +Table~\ref{tab:count:Limit Counter Performance on Power-6} shows the performance of the parallel limit-counting algorithms. Exact enforcement of the limits incurs a substantial performance penalty, although on this 4.7\,GHz \Power{6} system that penalty can be reduced diff --git a/defer/rcuexercises.tex b/defer/rcuexercises.tex index 1d9e138..21d9683 100644 --- a/defer/rcuexercises.tex +++ b/defer/rcuexercises.tex @@ -14,7 +14,7 @@ suffice for most of these exercises. \QuickQuiz{} The statistical-counter implementation shown in - Figure~\ref{fig:count:Per-Thread Statistical Counters} + Listing~\ref{lst:count:Per-Thread Statistical Counters} (\path{count_end.c}) used a global lock to guard the summation in \co{read_count()}, which resulted in poor performance and negative scalability. diff --git a/howto/howto.tex b/howto/howto.tex index 90abb14..1a0b75c 100644 --- a/howto/howto.tex +++ b/howto/howto.tex @@ -361,7 +361,7 @@ Other types of systems have well-known ways of locating files by filename. \section{Whose Book Is This?} \label{sec:howto:Whose Book Is This?} -\begin{figure*}[tbp] +\begin{listing*}[tbp] { \scriptsize \begin{verbbox} @@ -376,10 +376,10 @@ Other types of systems have well-known ways of locating files by filename. } \hspace*{1in}\OneColumnHSpace{-0.5in}\theverbbox \caption{Creating an Up-To-Date PDF} -\label{fig:howto:Creating a Up-To-Date PDF} -\end{figure*} +\label{lst:howto:Creating a Up-To-Date PDF} +\end{listing*} -\begin{figure*}[tbp] +\begin{listing*}[tbp] { \scriptsize \begin{verbbox} @@ -393,8 +393,8 @@ Other types of systems have well-known ways of locating files by filename. } \hspace*{1in}\OneColumnHSpace{-0.5in}\theverbbox \caption{Generating an Updated PDF} -\label{fig:howto:Generating an Updated PDF} -\end{figure*} +\label{lst:howto:Generating an Updated PDF} +\end{listing*} As the cover says, the editor is one Paul E.~McKenney. However, the editor does accept contributions via the @@ -416,18 +416,18 @@ in the file \path{FAQ-BUILD.txt} in the \LaTeX{} source to the book. To create and display a current \LaTeX{} source tree of this book, use the list of Linux commands shown in -Figure~\ref{fig:howto:Creating a Up-To-Date PDF}. +Listing~\ref{lst:howto:Creating a Up-To-Date PDF}. In some environments, the \co{evince} command that displays \path{perfbook.pdf} may need to be replaced, for example, with \co{acroread}. The \co{git clone} command need only be used the first time you create a PDF, subsequently, you can run the commands shown in -Figure~\ref{fig:howto:Generating an Updated PDF} to pull in any updates +Listing~\ref{lst:howto:Generating an Updated PDF} to pull in any updates and generate an updated PDF. The commands in -Figure~\ref{fig:howto:Generating an Updated PDF} +Listing~\ref{lst:howto:Generating an Updated PDF} must be run within the \path{perfbook} directory created by the commands shown in -Figure~\ref{fig:howto:Creating a Up-To-Date PDF}. +Listing~\ref{lst:howto:Creating a Up-To-Date PDF}. PDFs of this book are sporadically posted at \url{http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html} diff --git a/owned/owned.tex b/owned/owned.tex index f74b5d6..d584e82 100644 --- a/owned/owned.tex +++ b/owned/owned.tex @@ -132,8 +132,8 @@ is obviously quite attractive. } \QuickQuizEnd This same pattern can be written in C as well as in \co{sh}, as illustrated by -Figures~\ref{fig:toolsoftrade:Using the fork() Primitive} -and~\ref{fig:toolsoftrade:Using the wait() Primitive}. +Listings~\ref{lst:toolsoftrade:Using the fork() Primitive} +and~\ref{lst:toolsoftrade:Using the wait() Primitive}. The next section discusses use of data ownership in shared-memory parallel programs. @@ -150,8 +150,8 @@ of ownership and access rights. For example, consider the per-thread statistical counter implementation shown in -Figure~\ref{fig:count:Per-Thread Statistical Counters} on -page~\pageref{fig:count:Per-Thread Statistical Counters}. +Listing~\ref{lst:count:Per-Thread Statistical Counters} on +page~\pageref{lst:count:Per-Thread Statistical Counters}. Here, \co{inc_count()} updates only the corresponding thread's instance of \co{counter}, while \co{read_count()} accesses, but does not modify, all @@ -195,9 +195,9 @@ beginning on page~\pageref{sec:count:Signal-Theft Limit Counter Design}, in particular the \co{flush_local_count_sig()} and \co{flush_local_count()} functions in -Figure~\ref{fig:count:Signal-Theft Limit Counter Value-Migration Functions} +Listing~\ref{lst:count:Signal-Theft Limit Counter Value-Migration Functions} on -page~\pageref{fig:count:Signal-Theft Limit Counter Value-Migration Functions}. +page~\pageref{lst:count:Signal-Theft Limit Counter Value-Migration Functions}. The \co{flush_local_count_sig()} function is a signal handler that acts as the shipped function. @@ -207,12 +207,12 @@ the shipped function executes. This shipped function has the not-unusual added complication of needing to interact with any concurrently executing \co{add_count()} or \co{sub_count()} functions (see -Figure~\ref{fig:count:Signal-Theft Limit Counter Add Function} +Listing~\ref{lst:count:Signal-Theft Limit Counter Add Function} on -page~\pageref{fig:count:Signal-Theft Limit Counter Add Function} and -Figure~\ref{fig:count:Signal-Theft Limit Counter Subtract Function} +page~\pageref{lst:count:Signal-Theft Limit Counter Add Function} and +Listing~\ref{lst:count:Signal-Theft Limit Counter Subtract Function} on -page~\pageref{fig:count:Signal-Theft Limit Counter Subtract Function}). +page~\pageref{lst:count:Signal-Theft Limit Counter Subtract Function}). \QuickQuiz{} What mechanisms other than POSIX signals may be used for function @@ -246,7 +246,7 @@ The eventually consistent counter implementation described in Section~\ref{sec:count:Eventually Consistent Implementation} provides an example. This implementation has a designated thread that runs the \co{eventual()} function shown on lines~15-32 of -Figure~\ref{fig:count:Array-Based Per-Thread Eventually Consistent Counters}. +Listing~\ref{lst:count:Array-Based Per-Thread Eventually Consistent Counters}. This \co{eventual()} thread periodically pulls the per-thread counts into the global counter, so that accesses to the global counter will, as the name says, eventually converge on the actual value. @@ -254,7 +254,7 @@ as the name says, eventually converge on the actual value. \QuickQuiz{} But none of the data in the \co{eventual()} function shown on lines~15-32 of - Figure~\ref{fig:count:Array-Based Per-Thread Eventually Consistent Counters} + Listing~\ref{lst:count:Array-Based Per-Thread Eventually Consistent Counters} is actually owned by the \co{eventual()} thread! In just what way is this data ownership??? \QuickQuizAnswer{ @@ -298,8 +298,8 @@ a considerable reduction in the spread of certain types of disease. In other cases, privatization imposes costs. For example, consider the simple limit counter shown in -Figure~\ref{fig:count:Simple Limit Counter Add, Subtract, and Read} on -page~\pageref{fig:count:Simple Limit Counter Add, Subtract, and Read}. +Listing~\ref{lst:count:Simple Limit Counter Add, Subtract, and Read} on +page~\pageref{lst:count:Simple Limit Counter Add, Subtract, and Read}. This is an example of an algorithm where threads can read each others' data, but are only permitted to update their own data. A quick review of the algorithm shows that the only cross-thread diff --git a/together/applyrcu.tex b/together/applyrcu.tex index e3d6f0d..d477ab5 100644 --- a/together/applyrcu.tex +++ b/together/applyrcu.tex @@ -21,8 +21,8 @@ Unfortunately, threads needing to read out the value via \co{read_count()} were required to acquire a global lock, and thus incurred high overhead and suffered poor scalability. The code for the lock-based implementation is shown in -Figure~\ref{fig:count:Per-Thread Statistical Counters} on -Page~\pageref{fig:count:Per-Thread Statistical Counters}. +Listing~\ref{lst:count:Per-Thread Statistical Counters} on +Page~\pageref{lst:count:Per-Thread Statistical Counters}. \QuickQuiz{} Why on earth did we need that global lock in the first place? @@ -56,8 +56,8 @@ execution. Just what is the accuracy of \co{read_count()}, anyway? \QuickQuizAnswer{ Refer to - Figure~\ref{fig:count:Per-Thread Statistical Counters} on - Page~\pageref{fig:count:Per-Thread Statistical Counters}. + Listing~\ref{lst:count:Per-Thread Statistical Counters} on + Page~\pageref{lst:count:Per-Thread Statistical Counters}. Clearly, if there are no concurrent invocations of \co{inc_count()}, \co{read_count()} will return an exact result. However, if there \emph{are} concurrent invocations of @@ -205,7 +205,7 @@ the current \co{countarray} structure, and the \co{final_mutex} spinlock. Lines~10-13 show \co{inc_count()}, which is unchanged from -Figure~\ref{fig:count:Per-Thread Statistical Counters}. +Listing~\ref{lst:count:Per-Thread Statistical Counters}. Lines~15-29 show \co{read_count()}, which has changed significantly. Lines~21 and~27 substitute \co{rcu_read_lock()} and @@ -280,13 +280,13 @@ Line~69 can then safely free the old \co{countarray} structure. Wow! Figure~\ref{fig:together:RCU and Per-Thread Statistical Counters} contains 69 lines of code, compared to only 42 in - Figure~\ref{fig:count:Per-Thread Statistical Counters}. + Listing~\ref{lst:count:Per-Thread Statistical Counters}. Is this extra complexity really worth it? \QuickQuizAnswer{ This of course needs to be decided on a case-by-case basis. If you need an implementation of \co{read_count()} that scales linearly, then the lock-based implementation shown in - Figure~\ref{fig:count:Per-Thread Statistical Counters} + Listing~\ref{lst:count:Per-Thread Statistical Counters} simply will not work for you. On the other hand, if calls to \co{count_read()} are sufficiently rare, then the lock-based version is simpler and might thus be @@ -300,7 +300,7 @@ Line~69 can then safely free the old \co{countarray} structure. and the use of RCU unnecessary. This would in turn enable an implementation that was even simpler than the one shown in - Figure~\ref{fig:count:Per-Thread Statistical Counters}, but + Listing~\ref{lst:count:Per-Thread Statistical Counters}, but with all the scalability and performance benefits of the implementation shown in Figure~\ref{fig:together:RCU and Per-Thread Statistical Counters}! diff --git a/toolsoftrade/toolsoftrade.tex b/toolsoftrade/toolsoftrade.tex index bd43879..b18d3d0 100644 --- a/toolsoftrade/toolsoftrade.tex +++ b/toolsoftrade/toolsoftrade.tex @@ -189,7 +189,7 @@ These concerns can of course add substantial complexity to the code. For more information, see any of a number of textbooks on the subject~\cite{WRichardStevens1992,StewartWeiss2013UNIX}. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 pid = fork(); @@ -207,14 +207,14 @@ subject~\cite{WRichardStevens1992,StewartWeiss2013UNIX}. \centering \theverbbox \caption{Using the \tco{fork()} Primitive} -\label{fig:toolsoftrade:Using the fork() Primitive} -\end{figure} +\label{lst:toolsoftrade:Using the fork() Primitive} +\end{listing} If \co{fork()} succeeds, it returns twice, once for the parent and again for the child. The value returned from \co{fork()} allows the caller to tell the difference, as shown in -Figure~\ref{fig:toolsoftrade:Using the fork() Primitive} +Listing~\ref{lst:toolsoftrade:Using the fork() Primitive} (\path{forkjoin.c}). Line~1 executes the \co{fork()} primitive, and saves its return value in local variable \co{pid}. @@ -228,7 +228,7 @@ Otherwise, the \co{fork()} has executed successfully, and the parent therefore executes line~9 with the variable \co{pid} containing the process ID of the child. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 void waitall(void) @@ -251,8 +251,8 @@ process ID of the child. \centering \theverbbox \caption{Using the \tco{wait()} Primitive} -\label{fig:toolsoftrade:Using the wait() Primitive} -\end{figure} +\label{lst:toolsoftrade:Using the wait() Primitive} +\end{listing} The parent process may use the \co{wait()} primitive to wait for its children to complete. @@ -261,7 +261,7 @@ counterpart, as each invocation of \co{wait()} waits for but one child process. It is therefore customary to wrap \co{wait()} into a function similar to the \co{waitall()} function shown in -Figure~\ref{fig:toolsoftrade:Using the wait() Primitive} +Listing~\ref{lst:toolsoftrade:Using the wait() Primitive} (\path{api-pthread.h}), with this \co{waitall()} function having semantics similar to the shell-script \co{wait} command. @@ -283,14 +283,14 @@ Otherwise, lines~11 and 12 print an error and exit. child individually. In addition, some parallel applications need to detect the reason that the child died. - As we saw in Figure~\ref{fig:toolsoftrade:Using the wait() Primitive}, + As we saw in Listing~\ref{lst:toolsoftrade:Using the wait() Primitive}, it is not hard to build a \co{waitall()} function out of the \co{wait()} function, but it would be impossible to do the reverse. Once the information about a specific child is lost, it is lost. } \QuickQuizEnd -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 int x = 0; @@ -313,13 +313,13 @@ Otherwise, lines~11 and 12 print an error and exit. \centering \theverbbox \caption{Processes Created Via \tco{fork()} Do Not Share Memory} -\label{fig:toolsoftrade:Processes Created Via fork() Do Not Share Memory} -\end{figure} +\label{lst:toolsoftrade:Processes Created Via fork() Do Not Share Memory} +\end{listing} It is critically important to note that the parent and child do \emph{not} share memory. This is illustrated by the program shown in -Figure~\ref{fig:toolsoftrade:Processes Created Via fork() Do Not Share Memory} +Listing~\ref{lst:toolsoftrade:Processes Created Via fork() Do Not Share Memory} (\path{forkjoinvar.c}), in which the child sets a global variable \co{x} to 1 on line~6, prints a message on line~7, and exits on line~8. @@ -353,7 +353,7 @@ Parent process sees x=0 source code of the Linux-kernel implementations themselves. It is important to note that the parent process in - Figure~\ref{fig:toolsoftrade:Processes Created Via fork() Do Not Share Memory} + Listing~\ref{lst:toolsoftrade:Processes Created Via fork() Do Not Share Memory} waits until after the child terminates to do its \co{printf()}. Using \co{printf()}'s buffered I/O concurrently to the same file from multiple processes is non-trivial, and is best avoided. @@ -376,7 +376,7 @@ than fork-join parallelism. To create a thread within an existing process, invoke the \co{pthread_create()} primitive, for example, as shown on lines~15 and~16 of -Figure~\ref{fig:toolsoftrade:Threads Created Via pthread-create() Share Memory} +Listing~\ref{lst:toolsoftrade:Threads Created Via pthread-create() Share Memory} (\path{pcreate.c}). The first argument is a pointer to a \co{pthread_t} in which to store the ID of the thread to be created, the second \co{NULL} argument is a pointer @@ -385,7 +385,7 @@ to an optional \co{pthread_attr_t}, the third argument is the function that is to be invoked by the new thread, and the last \co{NULL} argument is the argument that will be passed to \co{mythread}. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 int x = 0; @@ -419,15 +419,15 @@ is the argument that will be passed to \co{mythread}. \centering \theverbbox \caption{Threads Created Via \tco{pthread_create()} Share Memory} -\label{fig:toolsoftrade:Threads Created Via pthread-create() Share Memory} -\end{figure} +\label{lst:toolsoftrade:Threads Created Via pthread-create() Share Memory} +\end{listing} In this example, \co{mythread()} simply returns, but it could instead call \co{pthread_exit()}. \QuickQuiz{} If the \co{mythread()} function in - Figure~\ref{fig:toolsoftrade:Threads Created Via pthread-create() Share Memory} + Listing~\ref{lst:toolsoftrade:Threads Created Via pthread-create() Share Memory} can simply return, why bother with \co{pthread_exit()}? \QuickQuizAnswer{ In this simple example, there is no reason whatsoever. @@ -450,7 +450,7 @@ or the value returned by the thread's top-level function, depending on how the thread in question exits. The program shown in -Figure~\ref{fig:toolsoftrade:Threads Created Via pthread-create() Share Memory} +Listing~\ref{lst:toolsoftrade:Threads Created Via pthread-create() Share Memory} produces output as follows, demonstrating that memory is in fact shared between the two threads: @@ -541,7 +541,7 @@ lock~\cite{Hoare74}. to the reader. } \QuickQuizEnd -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 pthread_mutex_t lock_a = PTHREAD_MUTEX_INITIALIZER; @@ -598,11 +598,11 @@ lock~\cite{Hoare74}. \centering \theverbbox \caption{Demonstration of Exclusive Locks} -\label{fig:toolsoftrade:Demonstration of Exclusive Locks} -\end{figure} +\label{lst:toolsoftrade:Demonstration of Exclusive Locks} +\end{listing} This exclusive-locking property is demonstrated using the code shown in -Figure~\ref{fig:toolsoftrade:Demonstration of Exclusive Locks} +Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks} (\path{lock.c}). Line~1 defines and initializes a POSIX lock named \co{lock_a}, while line~2 similarly defines and initializes a lock named \co{lock_b}. @@ -618,7 +618,7 @@ primitives. \QuickQuiz{} Why not simply make the argument to \co{lock_reader()} on line~5 of - Figure~\ref{fig:toolsoftrade:Demonstration of Exclusive Locks} + Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks} be a pointer to a \co{pthread_mutex_t}? \QuickQuizAnswer{ Because we will need to pass \co{lock_reader()} to @@ -653,7 +653,7 @@ required by \co{pthread_create()}. } \QuickQuizEnd Lines~31-49 of -Figure~\ref{fig:toolsoftrade:Demonstration of Exclusive Locks} +Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks} shows \co{lock_writer()}, which periodically update the shared variable \co{x} while holding the specified \co{pthread_mutex_t}. @@ -664,7 +664,7 @@ While holding the lock, lines~40-43 increment the shared variable \co{x}, sleeping for five milliseconds between each increment. Finally, lines~44-47 release the lock. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 printf("Creating two threads using same lock:\n"); @@ -691,10 +691,10 @@ Finally, lines~44-47 release the lock. \centering \theverbbox \caption{Demonstration of Same Exclusive Lock} -\label{fig:toolsoftrade:Demonstration of Same Exclusive Lock} -\end{figure} +\label{lst:toolsoftrade:Demonstration of Same Exclusive Lock} +\end{listing} -Figure~\ref{fig:toolsoftrade:Demonstration of Same Exclusive Lock} +Listing~\ref{lst:toolsoftrade:Demonstration of Same Exclusive Lock} shows a code fragment that runs \co{lock_reader()} and \co{lock_writer()} as threads using the same lock, namely, \co{lock_a}. Lines~2-6 create a thread running \co{lock_reader()}, and then @@ -719,7 +719,7 @@ by \co{lock_writer()} while holding the lock. \QuickQuiz{} Is ``x = 0'' the only possible output from the code fragment shown in - Figure~\ref{fig:toolsoftrade:Demonstration of Same Exclusive Lock}? + Listing~\ref{lst:toolsoftrade:Demonstration of Same Exclusive Lock}? If so, why? If not, what other output could appear, and why? \QuickQuizAnswer{ @@ -735,7 +735,7 @@ by \co{lock_writer()} while holding the lock. However, there are no guarantees, especially on a busy system. } \QuickQuizEnd -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 printf("Creating two threads w/different locks:\n"); @@ -763,10 +763,10 @@ by \co{lock_writer()} while holding the lock. \centering \theverbbox \caption{Demonstration of Different Exclusive Locks} -\label{fig:toolsoftrade:Demonstration of Different Exclusive Locks} -\end{figure} +\label{lst:toolsoftrade:Demonstration of Different Exclusive Locks} +\end{listing} -Figure~\ref{fig:toolsoftrade:Demonstration of Different Exclusive Locks} +Listing~\ref{lst:toolsoftrade:Demonstration of Different Exclusive Locks} shows a similar code fragment, but this time using different locks: \co{lock_a} for \co{lock_reader()} and \co{lock_b} for \co{lock_writer()}. @@ -811,7 +811,7 @@ values of \co{x} stored by \co{lock_writer()}. \QuickQuiz{} In the code shown in - Figure~\ref{fig:toolsoftrade:Demonstration of Different Exclusive Locks}, + Listing~\ref{lst:toolsoftrade:Demonstration of Different Exclusive Locks}, is \co{lock_reader()} guaranteed to see all the values produced by \co{lock_writer()}? Why or why not? @@ -825,19 +825,19 @@ values of \co{x} stored by \co{lock_writer()}. \QuickQuiz{} Wait a minute here!!! - Figure~\ref{fig:toolsoftrade:Demonstration of Same Exclusive Lock} + Listing~\ref{lst:toolsoftrade:Demonstration of Same Exclusive Lock} didn't initialize shared variable \co{x}, so why does it need to be initialized in - Figure~\ref{fig:toolsoftrade:Demonstration of Different Exclusive Locks}? + Listing~\ref{lst:toolsoftrade:Demonstration of Different Exclusive Locks}? \QuickQuizAnswer{ See line~3 of - Figure~\ref{fig:toolsoftrade:Demonstration of Exclusive Locks}. + Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks}. Because the code in - Figure~\ref{fig:toolsoftrade:Demonstration of Same Exclusive Lock} + Listing~\ref{lst:toolsoftrade:Demonstration of Same Exclusive Lock} ran first, it could rely on the compile-time initialization of \co{x}. The code in - Figure~\ref{fig:toolsoftrade:Demonstration of Different Exclusive Locks} + Listing~\ref{lst:toolsoftrade:Demonstration of Different Exclusive Locks} ran next, so it had to re-initialize \co{x}. } \QuickQuizEnd @@ -873,7 +873,7 @@ an arbitrarily large number of readers to concurrently hold the lock. However, in practice, we need to know how much additional scalability is provided by reader-writer locks. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 pthread_rwlock_t rwl = PTHREAD_RWLOCK_INITIALIZER; @@ -922,10 +922,10 @@ provided by reader-writer locks. \centering \theverbbox \caption{Measuring Reader-Writer Lock Scalability} -\label{fig:toolsoftrade:Measuring Reader-Writer Lock Scalability} -\end{figure} +\label{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability} +\end{listing} -Figure~\ref{fig:toolsoftrade:Measuring Reader-Writer Lock Scalability} +Listing~\ref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability} (\path{rwlockscale.c}) shows one way of measuring reader-writer lock scalability. Line~1 shows the definition and initialization of the reader-writer @@ -955,7 +955,7 @@ rights to assume that the value of \co{goflag} would never change. \QuickQuiz{} Instead of using \co{READ_ONCE()} everywhere, why not just declare \co{goflag} as \co{volatile} on line~10 of - Figure~\ref{fig:toolsoftrade:Measuring Reader-Writer Lock Scalability}? + Listing~\ref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}? \QuickQuizAnswer{ A \co{volatile} declaration is in fact a reasonable alternative in this particular case. @@ -977,7 +977,7 @@ rights to assume that the value of \co{goflag} would never change. Don't we also need memory barriers to make sure that the change in \co{goflag}'s value propagates to the CPU in a timely fashion in - Figure~\ref{fig:toolsoftrade:Measuring Reader-Writer Lock Scalability}? + Listing~\ref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}? \QuickQuizAnswer{ No, memory barriers are not needed and won't help here. Memory barriers only enforce ordering among multiple memory @@ -1167,7 +1167,7 @@ to protect the tiniest of critical sections. One such way are atomic operations. We have seen one atomic operations already, in the form of the \co{__sync_fetch_and_add()} primitive on line~18 of -Figure~\ref{fig:toolsoftrade:Measuring Reader-Writer Lock Scalability}. +Listing~\ref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}. This primitive atomically adds the value of its second argument to the value referenced by its first argument, returning the old value (which was ignored in this case). @@ -1243,11 +1243,11 @@ In some cases, it is sufficient to constrain the compiler's ability to reorder operations, while allowing the CPU free rein, in which case the \co{barrier()} primitive may be used, as it in fact was on line~28 of -Figure~\ref{fig:toolsoftrade:Measuring Reader-Writer Lock Scalability}. +Listing~\ref{lst:toolsoftrade:Measuring Reader-Writer Lock Scalability}. In some cases, it is only necessary to ensure that the compiler avoids optimizing away a given memory read, in which case the \co{READ_ONCE()} primitive may be used, as it was on line~17 of -Figure~\ref{fig:toolsoftrade:Demonstration of Exclusive Locks}. +Listing~\ref{lst:toolsoftrade:Demonstration of Exclusive Locks}. Similarly, the \co{WRITE_ONCE()} primitive may be used to prevent the compiler from optimizing away a given memory write. These last three primitives are not provided directly by \GCC, @@ -1416,10 +1416,10 @@ Threads share everything except for per-thread local state,\footnote{ which includes program counter and stack. The thread API is shown in -Figure~\ref{fig:toolsoftrade:Thread API}, and members are described in the +Listing~\ref{lst:toolsoftrade:Thread API}, and members are described in the following sections. -\begin{figure*}[tbp] +\begin{listing*}[tbp] { \scriptsize \begin{verbbox} int smp_thread_id(void) @@ -1433,8 +1433,8 @@ void wait_all_threads(void) \centering \theverbbox \caption{Thread API} -\label{fig:toolsoftrade:Thread API} -\end{figure*} +\label{lst:toolsoftrade:Thread API} +\end{listing*} \subsubsection{\tco{create_thread()}} @@ -1499,7 +1499,7 @@ a run, so such synchronization is normally not needed. \subsubsection{Example Usage} -Figure~\ref{fig:toolsoftrade:Example Child Thread} +Listing~\ref{lst:toolsoftrade:Example Child Thread} shows an example hello-world-like child thread. As noted earlier, each thread is allocated its own stack, so each thread has its own private \co{arg} argument and \co{myarg} variable. @@ -1509,7 +1509,7 @@ Note that the \co{return} statement on line~7 terminates the thread, returning a \co{NULL} to whoever invokes \co{wait_thread()} on this thread. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 void *thread_test(void *arg) @@ -1525,11 +1525,11 @@ thread. \centering \theverbbox \caption{Example Child Thread} -\label{fig:toolsoftrade:Example Child Thread} -\end{figure} +\label{lst:toolsoftrade:Example Child Thread} +\end{listing} The parent program is shown in -Figure~\ref{fig:toolsoftrade:Example Parent Thread}. +Listing~\ref{lst:toolsoftrade:Example Parent Thread}. It invokes \co{smp_init()} to initialize the threading system on line~6, parses arguments on lines~7-14, and announces its presence on line~15. @@ -1538,7 +1538,7 @@ and waits for them to complete on line~18. Note that \co{wait_all_threads()} discards the threads return values, as in this case they are all \co{NULL}, which is not very interesting. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} 1 int main(int argc, char *argv[]) @@ -1567,8 +1567,8 @@ as in this case they are all \co{NULL}, which is not very interesting. \centering \theverbbox \caption{Example Parent Thread} -\label{fig:toolsoftrade:Example Parent Thread} -\end{figure} +\label{lst:toolsoftrade:Example Parent Thread} +\end{listing} \QuickQuiz{} What happened to the Linux-kernel equivalents to \co{fork()} @@ -1584,11 +1584,11 @@ as in this case they are all \co{NULL}, which is not very interesting. \label{sec:toolsoftrade:Locking} A good starting subset of the Linux kernel's locking API is shown in -Figure~\ref{fig:toolsoftrade:Locking API}, +Listing~\ref{lst:toolsoftrade:Locking API}, each API element being described in the following sections. This book's CodeSamples locking API closely follows that of the Linux kernel. -\begin{figure}[tbp] +\begin{listing}[tbp] { \scriptsize \begin{verbbox} void spin_lock_init(spinlock_t *sp); @@ -1600,8 +1600,8 @@ void spin_unlock(spinlock_t *sp); \centering \theverbbox \caption{Locking API} -\label{fig:toolsoftrade:Locking API} -\end{figure} +\label{lst:toolsoftrade:Locking API} +\end{listing} \subsubsection{\tco{spin_lock_init()}} @@ -1721,7 +1721,7 @@ given per-CPU variable, \co{per_cpu()} to access a specified CPU's instance of a given per-CPU variable, along with many other special-purpose per-CPU operations. -Figure~\ref{fig:toolsoftrade:Per-Thread-Variable API} +Listing~\ref{lst:toolsoftrade:Per-Thread-Variable API} shows this book's per-thread-variable API, which is patterned after the Linux kernel's per-CPU-variable API. This API provides the per-thread equivalent of global variables. @@ -1729,7 +1729,7 @@ Although this API is, strictly speaking, not necessary\footnote{ You could instead use \co{__thread} or \co{_Thread_local}.}, it can provide a good userspace analogy to Linux kernel code. -\begin{figure}[htbp] +\begin{listing}[htbp] { \scriptsize \begin{verbbox} DEFINE_PER_THREAD(type, name) @@ -1742,8 +1742,8 @@ init_per_thread(name, v) \centering \theverbbox \caption{Per-Thread-Variable API} -\label{fig:toolsoftrade:Per-Thread-Variable API} -\end{figure} +\label{lst:toolsoftrade:Per-Thread-Variable API} +\end{listing} \QuickQuiz{} How could you work around the lack of a per-thread-variable -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe perfbook" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html