[PATCH v2] count: Switch from GCC to C11 thread-local storage

Elad Lahav <e2lahav@xxxxxxxxx> · Wed, 17 Aug 2022 07:00:50 -0400

Signed-off-by: Elad Lahav <e2lahav@xxxxxxxxx>
---
 CodeSamples/count/count_end.c | 12 ++++++------
 count/count.tex               | 20 ++++++++++++--------
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/CodeSamples/count/count_end.c b/CodeSamples/count/count_end.c
index 722ad2f7..5e7b9ee1 100644
--- a/CodeSamples/count/count_end.c
+++ b/CodeSamples/count/count_end.c
@@ -1,6 +1,6 @@
 /*
  * count_end.c: Per-thread statistical counters that provide sum at end.
- *	Uses __thread for each thread's counter.
+ *	Uses _Thread_local for each thread's counter.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -23,17 +23,17 @@
 #include "../api.h"
 
 //\begin{snippet}[labelbase=ln:count:count_end:whole,commandchars=\\\@\$]
-unsigned long __thread counter = 0;		//\lnlbl{var:b}
+unsigned long _Thread_local counter = 0;		//\lnlbl{var:b}
 unsigned long *counterp[NR_THREADS] = { NULL };
 unsigned long finalcount = 0;
 DEFINE_SPINLOCK(final_mutex);			//\lnlbl{var:e}
 
-static __inline__ void inc_count(void)		//\lnlbl{inc:b}
+static inline void inc_count(void)		//\lnlbl{inc:b}
 {
 	WRITE_ONCE(counter, counter + 1);
 }						//\lnlbl{inc:e}
 
-static __inline__ unsigned long read_count(void)
+static inline unsigned long read_count(void)
 {
 	int t;
 	unsigned long sum;
@@ -48,7 +48,7 @@ static __inline__ unsigned long read_count(void)
 }
 
 #ifndef FCV_SNIPPET
-__inline__ void count_init(void)
+inline void count_init(void)
 {
 }
 #endif /* FCV_SNIPPET */
@@ -73,7 +73,7 @@ void count_unregister_thread(int nthreadsexpected)	//\lnlbl{unreg:b}
 }							//\lnlbl{unreg:e}
 //\end{snippet}
 
-__inline__ void count_cleanup(void)
+inline void count_cleanup(void)
 {
 }
 
diff --git a/count/count.tex b/count/count.tex
index 523789e2..775cf77e 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -490,7 +490,7 @@ thread (presumably cache aligned and padded to avoid false sharing).
 	It can, and in this toy implementation, it does.
 	But it is not that hard to come up with an alternative
 	implementation that permits an arbitrary number of threads,
-	for example, using \GCC's \co{__thread} facility,
+	for example, using C11's \co{_Thread_local} facility,
 	as shown in
 	\cref{sec:count:Per-Thread-Variable-Based Implementation}.
 }\QuickQuizEnd
@@ -721,8 +721,12 @@ This is the topic of the next section.
 \subsection{Per-Thread-Variable-Based Implementation}
 \label{sec:count:Per-Thread-Variable-Based Implementation}
 
-\GCC\ provides an \apig{__thread} storage class that provides
-per-thread storage.
+The C language, since C11, features a \apig{_Thread_local} storage class that
+provides per-thread storage.
+\footnote{\GCC\ provides its own \apig{__thread} storage class, which was used
+in previous versions of this book.
+The two methods for specifying a thread-local variable are interchangeable
+when using \GCC\@.}
 This can be used as shown in
 \cref{lst:count:Per-Thread Statistical Counters} (\path{count_end.c})
 to implement
@@ -749,14 +753,14 @@ value of the counter and exiting threads.
 	Doesn't that explicit \co{counterp} array in
 	\cref{lst:count:Per-Thread Statistical Counters}
 	reimpose an arbitrary limit on the number of threads?
-	Why doesn't \GCC\ provide a \co{per_thread()} interface, similar
+	Why doesn't the C language provide a \co{per_thread()} interface, similar
 	to the Linux kernel's \co{per_cpu()} primitive, to allow
 	threads to more easily access each others' per-thread variables?
 }\QuickQuizAnswer{
 	Why indeed?
 
-	To be fair, \GCC\ faces some challenges that the Linux kernel
-	gets to ignore.
+	To be fair, user-mode thread-local storage faces some challenges
+	that the Linux kernel gets to ignore.
 	When a user-level thread exits, its per-thread variables all
 	disappear, which complicates the problem of per-thread-variable
 	access, particularly before the advent of user-level RCU
@@ -940,7 +944,7 @@ variables vanish when that thread exits.
 	more graceful manner.
 }\QuickQuizEnd
 
-Both the array-based and \co{__thread}-based approaches offer excellent
+Both the array-based and \co{_Thread_local}-based approaches offer excellent
 update-side performance and scalability.
 However, these benefits result in large read-side expense for large
 numbers of threads.
@@ -2961,7 +2965,7 @@ courtesy of eventual consistency.
 	work when there are more threads.
 	In addition, the last two algorithms interpose an additional
 	level of indirection because they map from integer thread ID
-	to the corresponding \co{__thread} variable.
+	to the corresponding \co{_Thread_local} variable.
 }\QuickQuizEndB
 %
 \QuickQuizE{
-- 
2.25.1