Re: [PATCH v2 25/26] docs: rcu: convert some articles from html to ReST

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Tue, 30 Jul 2019 19:15:16 -0400

On Tue, Jul 30, 2019 at 08:00:57PM -0300, Mauro Carvalho Chehab wrote:
> Em Tue, 30 Jul 2019 18:21:30 -0400
> Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> escreveu:
> 
> > On Tue, Jul 30, 2019 at 07:00:28PM -0300, Mauro Carvalho Chehab wrote:
> > > Em Tue, 30 Jul 2019 17:50:07 -0400
> > > Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> escreveu:
> > >   
> > > > On Tue, Jul 30, 2019 at 02:22:50PM -0700, Paul E. McKenney wrote:  
> > > > > On Fri, Jul 26, 2019 at 09:51:35AM -0300, Mauro Carvalho Chehab wrote:    
> > > > > > There are 4 RCU articles that are written on html format.
> > > > > > 
> > > > > > The way they are, they can't be part of the Linux Kernel
> > > > > > documentation body nor share the styles and pdf output.
> > > > > > 
> > > > > > So, convert them to ReST format.
> > > > > > 
> > > > > > This way, make htmldocs and make pdfdocs will produce a
> > > > > > documentation output that will be like the original ones, but
> > > > > > will be part of the Linux Kernel documentation body.
> > > > > > 
> > > > > > Part of the conversion was done with the help of pandoc, but
> > > > > > the result had some broken things that had to be manually
> > > > > > fixed.
> > > > > > 
> > > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx>    
> > > > > 
> > > > > I am having some trouble applying these, at least in part due to UTF-8
> > > > > sequences, for example double left quotation mark.  These end up being
> > > > > "=E2=80=9C", with a few space characters turned into "=20".
> > > > > 
> > > > > Any advice on how to apply these?  Should I just pull commits from
> > > > > somewhere?    
> > > > 
> > > > I was able to successfully apply and build this particular patch. I think
> > > > this is the only one in the series that applies to RCU.
> > > > 
> > > > Sadly, I can't find the patch in any of the public archives, but I could
> > > > perhaps email it to you as an .mbox attach which 'git am' should be able to
> > > > apply.
> > > > 
> > > > Mauro did say he was going to add some more details to changelog, or it could
> > > > be added when it is applied:
> > > > https://lore.kernel.org/rcu/20190726154550.5eeae294@xxxxxxxx/  
> > > 
> > > Yeah, I'm also planning to address at least some of the issues you
> > > pointed, in order to improve the html output, but got sidetracked by something 
> > > else and didn't find any time yet to finish. I'm adding some CI automation for
> > > the media subsystem in order to help us dealing with the huge amount of patches
> > > we receive there.
> > > 
> > > Feel free to add those details to the changelog. I may find some spare time 
> > > this week or the next one for the improvements you suggested, but those 
> > > could be sent on followup patches, once done.  
> > 
> > Ok, I will re-send this RCU patch with the changes, leave this one to me.
> > 
> > The other memory model one, needs a lot more work so we can keep that aside
> > for now till someone has the time.
> 
> Yeah, feel free to do what fits best with regards to this one.
> 
> If you prefer to merge it instead, I'm enclosing the last version of it,
> with that quick hack I just wrote, plus the change of the toctree for
> it to produce the 2-level index.

Ok thanks a lot Mauro! It looks much better to me now. I can probably send
this one out too to Paul after the RCU ones are in, after some more careful
review..

However, I am curious what Paul and Alan think about it since some folks such
as PeterZ actively deal with memory model stuff and probably refer a lot to
this document, but don't like ReST (unlike us) and prefer plain text editors
without any markup. Hopefully this is not the case any more..

thanks,

 - Joel

> Thanks,
> Mauro
> 
> [PATCH] tools: memory-model: add it to the Documentation body
> 
> The books at tools/memory-model/Documentation are very well
> formatted. Congrats to the ones that wrote them!
> 
> The manual conversion to ReST is really trivial:
> 
> 	- Add document titles;
> 	- change the bullets on some lists;
> 	- mark code blocks.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx>
> 
> diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
> index 976e85adffe8..7e6500a6fa76 100644
> --- a/Documentation/core-api/refcount-vs-atomic.rst
> +++ b/Documentation/core-api/refcount-vs-atomic.rst
> @@ -17,7 +17,7 @@ in order to help maintainers validate their code against the change in
>  these memory ordering guarantees.
>  
>  The terms used through this document try to follow the formal LKMM defined in
> -tools/memory-model/Documentation/explanation.txt.
> +tools/memory-model/Documentation/explanation.rst.
>  
>  memory-barriers.txt and atomic_t.txt provide more background to the
>  memory ordering in general and for atomic operations specifically.
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index 2df5a3da563c..5123770ba77e 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst
> @@ -36,6 +36,7 @@ trying to get it to work optimally on a given system.
>  
>     admin-guide/index
>     kbuild/index
> +   tools/index
>  
>  Firmware-related documentation
>  ------------------------------
> diff --git a/Documentation/tools/index.rst b/Documentation/tools/index.rst
> new file mode 100644
> index 000000000000..f540d9cc75a1
> --- /dev/null
> +++ b/Documentation/tools/index.rst
> @@ -0,0 +1,17 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===========
> +Linux Tools
> +===========
> +
> +.. toctree::
> +   :maxdepth: 2
> +
> +   memory-model/index
> +
> +.. only::  subproject and html
> +
> +   Indices
> +   =======
> +
> +   * :ref:`genindex`
> diff --git a/Documentation/tools/memory-model b/Documentation/tools/memory-model
> new file mode 120000
> index 000000000000..020ed38ce302
> --- /dev/null
> +++ b/Documentation/tools/memory-model
> @@ -0,0 +1 @@
> +../../tools/memory-model/Documentation/
> \ No newline at end of file
> diff --git a/tools/memory-model/Documentation/cheatsheet.rst b/tools/memory-model/Documentation/cheatsheet.rst
> new file mode 100644
> index 000000000000..35f8749b8a53
> --- /dev/null
> +++ b/tools/memory-model/Documentation/cheatsheet.rst
> @@ -0,0 +1,36 @@
> +===========
> +Cheat Sheet
> +===========
> +
> +::
> +
> +				    Prior Operation     Subsequent Operation
> +				    ---------------  ---------------------------
> +				 C  Self  R  W  RMW  Self  R  W  DR  DW  RMW  SV
> +				--  ----  -  -  ---  ----  -  -  --  --  ---  --
> +
> +  Store, e.g., WRITE_ONCE()            Y                                       Y
> +  Load, e.g., READ_ONCE()              Y                          Y   Y        Y
> +  Unsuccessful RMW operation           Y                          Y   Y        Y
> +  rcu_dereference()                    Y                          Y   Y        Y
> +  Successful *_acquire()               R                   Y  Y   Y   Y    Y   Y
> +  Successful *_release()         C        Y  Y    Y     W                      Y
> +  smp_rmb()                               Y       R        Y      Y        R
> +  smp_wmb()                                  Y    W           Y       Y    W
> +  smp_mb() & synchronize_rcu()  CP        Y  Y    Y        Y  Y   Y   Y    Y
> +  Successful full non-void RMW  CP     Y  Y  Y    Y     Y  Y  Y   Y   Y    Y   Y
> +  smp_mb__before_atomic()       CP        Y  Y    Y        a  a   a   a    Y
> +  smp_mb__after_atomic()        CP        a  a    Y        Y  Y   Y   Y    Y
> +
> +
> +  Key:    C:      Ordering is cumulative
> +	  P:      Ordering propagates
> +	  R:      Read, for example, READ_ONCE(), or read portion of RMW
> +	  W:      Write, for example, WRITE_ONCE(), or write portion of RMW
> +	  Y:      Provides ordering
> +	  a:      Provides ordering given intervening RMW atomic operation
> +	  DR:     Dependent read (address dependency)
> +	  DW:     Dependent write (address, data, or control dependency)
> +	  RMW:    Atomic read-modify-write operation
> +	  SELF:   Orders self, as opposed to accesses before and/or after
> +	  SV:     Orders later accesses to the same variable
> diff --git a/tools/memory-model/Documentation/cheatsheet.txt b/tools/memory-model/Documentation/cheatsheet.txt
> deleted file mode 100644
> index 33ba98d72b16..000000000000
> --- a/tools/memory-model/Documentation/cheatsheet.txt
> +++ /dev/null
> @@ -1,30 +0,0 @@
> -                                  Prior Operation     Subsequent Operation
> -                                  ---------------  ---------------------------
> -                               C  Self  R  W  RMW  Self  R  W  DR  DW  RMW  SV
> -                              --  ----  -  -  ---  ----  -  -  --  --  ---  --
> -
> -Store, e.g., WRITE_ONCE()            Y                                       Y
> -Load, e.g., READ_ONCE()              Y                          Y   Y        Y
> -Unsuccessful RMW operation           Y                          Y   Y        Y
> -rcu_dereference()                    Y                          Y   Y        Y
> -Successful *_acquire()               R                   Y  Y   Y   Y    Y   Y
> -Successful *_release()         C        Y  Y    Y     W                      Y
> -smp_rmb()                               Y       R        Y      Y        R
> -smp_wmb()                                  Y    W           Y       Y    W
> -smp_mb() & synchronize_rcu()  CP        Y  Y    Y        Y  Y   Y   Y    Y
> -Successful full non-void RMW  CP     Y  Y  Y    Y     Y  Y  Y   Y   Y    Y   Y
> -smp_mb__before_atomic()       CP        Y  Y    Y        a  a   a   a    Y
> -smp_mb__after_atomic()        CP        a  a    Y        Y  Y   Y   Y    Y
> -
> -
> -Key:	C:	Ordering is cumulative
> -	P:	Ordering propagates
> -	R:	Read, for example, READ_ONCE(), or read portion of RMW
> -	W:	Write, for example, WRITE_ONCE(), or write portion of RMW
> -	Y:	Provides ordering
> -	a:	Provides ordering given intervening RMW atomic operation
> -	DR:	Dependent read (address dependency)
> -	DW:	Dependent write (address, data, or control dependency)
> -	RMW:	Atomic read-modify-write operation
> -	SELF:	Orders self, as opposed to accesses before and/or after
> -	SV:	Orders later accesses to the same variable
> diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.rst
> similarity index 92%
> rename from tools/memory-model/Documentation/explanation.txt
> rename to tools/memory-model/Documentation/explanation.rst
> index 68caa9a976d0..9b5d10cef0c2 100644
> --- a/tools/memory-model/Documentation/explanation.txt
> +++ b/tools/memory-model/Documentation/explanation.rst
> @@ -1,5 +1,6 @@
> +========================================================
>  Explanation of the Linux-Kernel Memory Consistency Model
> -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +========================================================
>  
>  :Author: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
>  :Created: October 2017
> @@ -107,7 +108,7 @@ and the flag are memory locations shared between the two CPUs.
>  
>  We can abstract out the important pieces of the driver code as follows
>  (the reason for using WRITE_ONCE() and READ_ONCE() instead of simple
> -assignment statements is discussed later):
> +assignment statements is discussed later)::
>  
>  	int buf = 0, flag = 0;
>  
> @@ -219,7 +220,7 @@ Ordering).  This model predicts that the undesired outcome for the MP
>  pattern cannot occur, but in other respects it differs from Sequential
>  Consistency.  One example is the Store Buffer (SB) pattern, in which
>  each CPU stores to its own shared location and then loads from the
> -other CPU's location:
> +other CPU's location::
>  
>  	int x = 0, y = 0;
>  
> @@ -264,7 +265,7 @@ The counterpart to ordering is a cycle.  Ordering rules out cycles:
>  It's not possible to have X ordered before Y, Y ordered before Z, and
>  Z ordered before X, because this would mean that X is ordered before
>  itself.  The analysis of the MP example under Sequential Consistency
> -involved just such an impossible cycle:
> +involved just such an impossible cycle::
>  
>  	W: P0 stores 1 to flag   executes before
>  	X: P1 loads 1 from flag  executes before
> @@ -331,7 +332,7 @@ can think of it as the order in which statements occur in the source
>  code after branches are taken into account and loops have been
>  unrolled.  A better description might be the order in which
>  instructions are presented to a CPU's execution unit.  Thus, we say
> -that X is po-before Y (written as "X ->po Y" in formulas) if X occurs
> +that X is po-before Y (written as ``X ->po Y`` in formulas) if X occurs
>  before Y in the instruction stream.
>  
>  This is inherently a single-CPU relation; two instructions executing
> @@ -388,7 +389,7 @@ The protections provided by READ_ONCE(), WRITE_ONCE(), and others are
>  not perfect; and under some circumstances it is possible for the
>  compiler to undermine the memory model.  Here is an example.  Suppose
>  both branches of an "if" statement store the same value to the same
> -location:
> +location::
>  
>  	r1 = READ_ONCE(x);
>  	if (r1) {
> @@ -402,7 +403,7 @@ location:
>  For this code, the LKMM predicts that the load from x will always be
>  executed before either of the stores to y.  However, a compiler could
>  lift the stores out of the conditional, transforming the code into
> -something resembling:
> +something resembling::
>  
>  	r1 = READ_ONCE(x);
>  	WRITE_ONCE(y, 2);
> @@ -418,7 +419,7 @@ model's original prediction could be invalidated by the compiler.
>  
>  Another issue arises from the fact that in C, arguments to many
>  operators and function calls can be evaluated in any order.  For
> -example:
> +example::
>  
>  	r1 = f(5) + g(6);
>  
> @@ -440,7 +441,7 @@ control (ctrl).
>  
>  A read and a write event are linked by a data dependency if the value
>  obtained by the read affects the value stored by the write.  As a very
> -simple example:
> +simple example::
>  
>  	int x, y;
>  
> @@ -455,7 +456,7 @@ values of multiple reads.
>  A read event and another memory access event are linked by an address
>  dependency if the value obtained by the read affects the location
>  accessed by the other event.  The second event can be either a read or
> -a write.  Here's another simple example:
> +a write.  Here's another simple example::
>  
>  	int a[20];
>  	int i;
> @@ -471,7 +472,7 @@ pointer.
>  
>  Finally, a read event and another memory access event are linked by a
>  control dependency if the value obtained by the read affects whether
> -the second event is executed at all.  Simple example:
> +the second event is executed at all.  Simple example::
>  
>  	int x, y;
>  
> @@ -484,9 +485,9 @@ which depends on the value obtained by the READ_ONCE(); hence there is
>  a control dependency from the load to the store.
>  
>  It should be pretty obvious that events can only depend on reads that
> -come earlier in program order.  Symbolically, if we have R ->data X,
> -R ->addr X, or R ->ctrl X (where R is a read event), then we must also
> -have R ->po X.  It wouldn't make sense for a computation to depend
> +come earlier in program order.  Symbolically, if we have ``R ->data X``,
> +``R ->addr X``, or ``R ->ctrl X`` (where R is a read event), then we must also
> +have ``R ->po X``.  It wouldn't make sense for a computation to depend
>  somehow on a value that doesn't get loaded from shared memory until
>  later in the code!
>  
> @@ -497,7 +498,7 @@ THE READS-FROM RELATION: rf, rfi, and rfe
>  The reads-from relation (rf) links a write event to a read event when
>  the value loaded by the read is the value that was stored by the
>  write.  In colloquial terms, the load "reads from" the store.  We
> -write W ->rf R to indicate that the load R reads from the store W.  We
> +write ``W ->rf R`` to indicate that the load R reads from the store W.  We
>  further distinguish the cases where the load and the store occur on
>  the same CPU (internal reads-from, or rfi) and where they occur on
>  different CPUs (external reads-from, or rfe).
> @@ -510,7 +511,7 @@ Usage of the rf relation implicitly assumes that loads will always
>  read from a single store.  It doesn't apply properly in the presence
>  of load-tearing, where a load obtains some of its bits from one store
>  and some of them from another store.  Fortunately, use of READ_ONCE()
> -and WRITE_ONCE() will prevent load-tearing; it's not possible to have:
> +and WRITE_ONCE() will prevent load-tearing; it's not possible to have::
>  
>  	int x = 0;
>  
> @@ -530,7 +531,7 @@ and end up with r1 = 0x1200 (partly from x's initial value and partly
>  from the value stored by P0).
>  
>  On the other hand, load-tearing is unavoidable when mixed-size
> -accesses are used.  Consider this example:
> +accesses are used.  Consider this example::
>  
>  	union {
>  		u32	w;
> @@ -578,26 +579,26 @@ that value comes third, and so on.
>  You can think of the coherence order as being the order in which the
>  stores reach x's location in memory (or if you prefer a more
>  hardware-centric view, the order in which the stores get written to
> -x's cache line).  We write W ->co W' if W comes before W' in the
> +x's cache line).  We write ``W ->co W'`` if W comes before W' in the
>  coherence order, that is, if the value stored by W gets overwritten,
>  directly or indirectly, by the value stored by W'.
>  
>  Coherence order is required to be consistent with program order.  This
>  requirement takes the form of four coherency rules:
>  
> -	Write-write coherence: If W ->po-loc W' (i.e., W comes before
> +	Write-write coherence: If ``W ->po-loc W'`` (i.e., W comes before
>  	W' in program order and they access the same location), where W
> -	and W' are two stores, then W ->co W'.
> +	and W' are two stores, then ``W ->co W'``.
>  
> -	Write-read coherence: If W ->po-loc R, where W is a store and R
> +	Write-read coherence: If ``W ->po-loc R``, where W is a store and R
>  	is a load, then R must read from W or from some other store
>  	which comes after W in the coherence order.
>  
> -	Read-write coherence: If R ->po-loc W, where R is a load and W
> +	Read-write coherence: If ``R ->po-loc W``, where R is a load and W
>  	is a store, then the store which R reads from must come before
>  	W in the coherence order.
>  
> -	Read-read coherence: If R ->po-loc R', where R and R' are two
> +	Read-read coherence: If ``R ->po-loc R'``, where R and R' are two
>  	loads, then either they read from the same store or else the
>  	store read by R comes before the store read by R' in the
>  	coherence order.
> @@ -612,7 +613,7 @@ requirement that every store eventually becomes visible to every CPU.)
>  Any reasonable memory model will include cache coherence.  Indeed, our
>  expectation of cache coherence is so deeply ingrained that violations
>  of its requirements look more like hardware bugs than programming
> -errors:
> +errors::
>  
>  	int x;
>  
> @@ -628,6 +629,8 @@ write-write coherence rule: Since the store of 23 comes later in
>  program order, it must also come later in x's coherence order and
>  thus must overwrite the store of 17.
>  
> +::
> +
>  	int x = 0;
>  
>  	P0()
> @@ -644,6 +647,8 @@ program order, so it must not read from that store but rather from one
>  coming earlier in the coherence order (in this case, x's initial
>  value).
>  
> +::
> +
>  	int x = 0;
>  
>  	P0()
> @@ -689,12 +694,12 @@ THE FROM-READS RELATION: fr, fri, and fre
>  
>  The from-reads relation (fr) can be a little difficult for people to
>  grok.  It describes the situation where a load reads a value that gets
> -overwritten by a store.  In other words, we have R ->fr W when the
> +overwritten by a store.  In other words, we have ``R ->fr W`` when the
>  value that R reads is overwritten (directly or indirectly) by W, or
>  equivalently, when R reads from a store which comes earlier than W in
>  the coherence order.
>  
> -For example:
> +For example::
>  
>  	int x = 0;
>  
> @@ -718,9 +723,11 @@ different CPUs).
>  
>  Note that the fr relation is determined entirely by the rf and co
>  relations; it is not independent.  Given a read event R and a write
> -event W for the same location, we will have R ->fr W if and only if
> +event W for the same location, we will have ``R ->fr W`` if and only if
>  the write which R reads from is co-before W.  In symbols,
>  
> +::
> +
>  	(R ->fr W) := (there exists W' with W' ->rf R and W' ->co W).
>  
>  
> @@ -843,13 +850,13 @@ defined to link memory access events E and F whenever:
>  	event occurs between them in program order; or
>  
>  	F is a release fence and some X comes before F in program order,
> -	where either X = E or else E ->rf X; or
> +	where either ``X = E`` or else ``E ->rf X``; or
>  
>  	A strong fence event occurs between some X and F in program
> -	order, where either X = E or else E ->rf X.
> +	order, where either ``X = E`` or else ``E ->rf X``.
>  
>  The operational model requires that whenever W and W' are both stores
> -and W ->cumul-fence W', then W must propagate to any given CPU
> +and ``W ->cumul-fence W'``, then W must propagate to any given CPU
>  before W' does.  However, for different CPUs C and C', it does not
>  require W to propagate to C before W' propagates to C'.
>  
> @@ -903,11 +910,11 @@ first for CPU 0, then CPU 1, etc.
>  
>  You can check that the four coherency rules imply that the rf, co, fr,
>  and po-loc relations agree with this global ordering; in other words,
> -whenever we have X ->rf Y or X ->co Y or X ->fr Y or X ->po-loc Y, the
> +whenever we have ``X ->rf Y`` or ``X ->co Y`` or ``X ->fr Y`` or ``X ->po-loc Y``, the
>  X event comes before the Y event in the global ordering.  The LKMM's
>  "coherence" axiom expresses this by requiring the union of these
>  relations not to have any cycles.  This means it must not be possible
> -to find events
> +to find events::
>  
>  	X0 -> X1 -> X2 -> ... -> Xn -> X0,
>  
> @@ -929,7 +936,7 @@ this case) does not get altered between the read and the write events
>  making up the atomic operation.  In particular, if two CPUs perform
>  atomic_inc(&x) concurrently, it must be guaranteed that the final
>  value of x will be the initial value plus two.  We should never have
> -the following sequence of events:
> +the following sequence of events::
>  
>  	CPU 0 loads x obtaining 13;
>  					CPU 1 loads x obtaining 13;
> @@ -951,6 +958,8 @@ atomic read-modify-write and W' is the write event which R reads from,
>  there must not be any stores coming between W' and W in the coherence
>  order.  Equivalently,
>  
> +::
> +
>  	(R ->rmw W) implies (there is no X with R ->fr X and X ->co W),
>  
>  where the rmw relation links the read and write events making up each
> @@ -968,7 +977,7 @@ po.
>  
>  The operational model already includes a description of one such
>  situation: Fences are a source of ppo links.  Suppose X and Y are
> -memory accesses with X ->po Y; then the CPU must execute X before Y if
> +memory accesses with ``X ->po Y``; then the CPU must execute X before Y if
>  any of the following hold:
>  
>  	A strong (smp_mb() or synchronize_rcu()) fence occurs between
> @@ -987,7 +996,7 @@ any of the following hold:
>  Another possibility, not mentioned earlier but discussed in the next
>  section, is:
>  
> -	X and Y are both loads, X ->addr Y (i.e., there is an address
> +	X and Y are both loads, ``X ->addr Y`` (i.e., there is an address
>  	dependency from X to Y), and X is a READ_ONCE() or an atomic
>  	access.
>  
> @@ -1021,7 +1030,7 @@ includes address dependencies to loads in the ppo relation.
>  
>  On the other hand, dependencies can indirectly affect the ordering of
>  two loads.  This happens when there is a dependency from a load to a
> -store and a second, po-later load reads from that store:
> +store and a second, po-later load reads from that store::
>  
>  	R ->dep W ->rfi R',
>  
> @@ -1036,7 +1045,7 @@ R; if the speculation turns out to be wrong then the CPU merely has to
>  restart or abandon R'.
>  
>  (In theory, a CPU might forward a store to a load when it runs across
> -an address dependency like this:
> +an address dependency like this::
>  
>  	r1 = READ_ONCE(ptr);
>  	WRITE_ONCE(*r1, 17);
> @@ -1048,7 +1057,7 @@ However, none of the architectures supported by the Linux kernel do
>  this.)
>  
>  Two memory accesses of the same location must always be executed in
> -program order if the second access is a store.  Thus, if we have
> +program order if the second access is a store.  Thus, if we have::
>  
>  	R ->po-loc W
>  
> @@ -1056,7 +1065,7 @@ program order if the second access is a store.  Thus, if we have
>  access the same location), the CPU is obliged to execute W after R.
>  If it executed W first then the memory subsystem would respond to R's
>  read request with the value stored by W (or an even later store), in
> -violation of the read-write coherence rule.  Similarly, if we had
> +violation of the read-write coherence rule.  Similarly, if we had::
>  
>  	W ->po-loc W'
>  
> @@ -1074,7 +1083,7 @@ AND THEN THERE WAS ALPHA
>  
>  As mentioned above, the Alpha architecture is unique in that it does
>  not appear to respect address dependencies to loads.  This means that
> -code such as the following:
> +code such as the following::
>  
>  	int x = 0;
>  	int y = -1;
> @@ -1128,7 +1137,7 @@ nothing at all on non-Alpha builds) after every READ_ONCE() and atomic
>  load.  The effect of the fence is to cause the CPU not to execute any
>  po-later instructions until after the local cache has finished
>  processing all the stores it has already received.  Thus, if the code
> -was changed to:
> +was changed to::
>  
>  	P1()
>  	{
> @@ -1167,25 +1176,25 @@ The happens-before relation (hb) links memory accesses that have to
>  execute in a certain order.  hb includes the ppo relation and two
>  others, one of which is rfe.
>  
> -W ->rfe R implies that W and R are on different CPUs.  It also means
> +``W ->rfe R`` implies that W and R are on different CPUs.  It also means
>  that W's store must have propagated to R's CPU before R executed;
>  otherwise R could not have read the value stored by W.  Therefore W
> -must have executed before R, and so we have W ->hb R.
> +must have executed before R, and so we have ``W ->hb R``.
>  
> -The equivalent fact need not hold if W ->rfi R (i.e., W and R are on
> +The equivalent fact need not hold if ``W ->rfi R`` (i.e., W and R are on
>  the same CPU).  As we have already seen, the operational model allows
>  W's value to be forwarded to R in such cases, meaning that R may well
>  execute before W does.
>  
>  It's important to understand that neither coe nor fre is included in
>  hb, despite their similarities to rfe.  For example, suppose we have
> -W ->coe W'.  This means that W and W' are stores to the same location,
> +``W ->coe W'``.  This means that W and W' are stores to the same location,
>  they execute on different CPUs, and W comes before W' in the coherence
>  order (i.e., W' overwrites W).  Nevertheless, it is possible for W' to
>  execute before W, because the decision as to which store overwrites
>  the other is made later by the memory subsystem.  When the stores are
>  nearly simultaneous, either one can come out on top.  Similarly,
> -R ->fre W means that W overwrites the value which R reads, but it
> +``R ->fre W`` means that W overwrites the value which R reads, but it
>  doesn't mean that W has to execute after R.  All that's necessary is
>  for the memory subsystem not to propagate W to R's CPU until after R
>  has executed, which is possible if W executes shortly before R.
> @@ -1199,7 +1208,7 @@ the first event in the coherence order and propagates to C before the
>  second event executes.
>  
>  This is best explained with some examples.  The simplest case looks
> -like this:
> +like this::
>  
>  	int x;
>  
> @@ -1225,7 +1234,7 @@ event, because P1's store came after P0's store in x's coherence
>  order, and P1's store propagated to P0 before P0's load executed.
>  
>  An equally simple case involves two loads of the same location that
> -read from different stores:
> +read from different stores::
>  
>  	int x = 0;
>  
> @@ -1252,7 +1261,7 @@ P1's store propagated to P0 before P0's second load executed.
>  
>  Less trivial examples of prop all involve fences.  Unlike the simple
>  examples above, they can require that some instructions are executed
> -out of program order.  This next one should look familiar:
> +out of program order.  This next one should look familiar::
>  
>  	int buf = 0, flag = 0;
>  
> @@ -1303,7 +1312,7 @@ rfe link.  You can concoct more exotic examples, containing more than
>  one fence, although this quickly leads to diminishing returns in terms
>  of complexity.  For instance, here's an example containing a coe link
>  followed by two fences and an rfe link, utilizing the fact that
> -release fences are A-cumulative:
> +release fences are A-cumulative::
>  
>  	int x, y, z;
>  
> @@ -1363,7 +1372,7 @@ links.  Let's see how this definition works out.
>  
>  Consider first the case where E is a store (implying that the sequence
>  of links begins with coe).  Then there are events W, X, Y, and Z such
> -that:
> +that::
>  
>  	E ->coe W ->cumul-fence* X ->rfe? Y ->strong-fence Z ->hb* F,
>  
> @@ -1384,13 +1393,13 @@ The existence of a pb link from E to F implies that E must execute
>  before F.  To see why, suppose that F executed first.  Then W would
>  have propagated to E's CPU before E executed.  If E was a store, the
>  memory subsystem would then be forced to make E come after W in the
> -coherence order, contradicting the fact that E ->coe W.  If E was a
> +coherence order, contradicting the fact that ``E ->coe W``.  If E was a
>  load, the memory subsystem would then be forced to satisfy E's read
>  request with the value stored by W or an even later store,
> -contradicting the fact that E ->fre W.
> +contradicting the fact that ``E ->fre W``.
>  
>  A good example illustrating how pb works is the SB pattern with strong
> -fences:
> +fences::
>  
>  	int x = 0, y = 0;
>  
> @@ -1449,18 +1458,18 @@ span a full grace period.  In more detail, the Guarantee says:
>  	For any critical section C and any grace period G, at least
>  	one of the following statements must hold:
>  
> -(1)	C ends before G does, and in addition, every store that
> -	propagates to C's CPU before the end of C must propagate to
> -	every CPU before G ends.
> +	(1)	C ends before G does, and in addition, every store that
> +		propagates to C's CPU before the end of C must propagate to
> +		every CPU before G ends.
>  
> -(2)	G starts before C does, and in addition, every store that
> -	propagates to G's CPU before the start of G must propagate
> -	to every CPU before C starts.
> +	(2)	G starts before C does, and in addition, every store that
> +		propagates to G's CPU before the start of G must propagate
> +		to every CPU before C starts.
>  
>  In particular, it is not possible for a critical section to both start
>  before and end after a grace period.
>  
> -Here is a simple example of RCU in action:
> +Here is a simple example of RCU in action::
>  
>  	int x, y;
>  
> @@ -1509,9 +1518,9 @@ entirely clear.  The LKMM formalizes this notion by means of the
>  rcu-link relation.  rcu-link encompasses a very general notion of
>  "before": If E and F are RCU fence events (i.e., rcu_read_lock(),
>  rcu_read_unlock(), or synchronize_rcu()) then among other things,
> -E ->rcu-link F includes cases where E is po-before some memory-access
> +``E ->rcu-link F`` includes cases where E is po-before some memory-access
>  event X, F is po-after some memory-access event Y, and we have any of
> -X ->rfe Y, X ->co Y, or X ->fr Y.
> +``X ->rfe Y``, ``X ->co Y``, or ``X ->fr Y``.
>  
>  The formal definition of the rcu-link relation is more than a little
>  obscure, and we won't give it here.  It is closely related to the pb
> @@ -1523,50 +1532,50 @@ The LKMM also defines the rcu-gp and rcu-rscsi relations.  They bring
>  grace periods and read-side critical sections into the picture, in the
>  following way:
>  
> -	E ->rcu-gp F means that E and F are in fact the same event,
> +	``E ->rcu-gp F`` means that E and F are in fact the same event,
>  	and that event is a synchronize_rcu() fence (i.e., a grace
>  	period).
>  
> -	E ->rcu-rscsi F means that E and F are the rcu_read_unlock()
> +	``E ->rcu-rscsi F`` means that E and F are the rcu_read_unlock()
>  	and rcu_read_lock() fence events delimiting some read-side
>  	critical section.  (The 'i' at the end of the name emphasizes
>  	that this relation is "inverted": It links the end of the
>  	critical section to the start.)
>  
>  If we think of the rcu-link relation as standing for an extended
> -"before", then X ->rcu-gp Y ->rcu-link Z roughly says that X is a
> +"before", then ``X ->rcu-gp Y ->rcu-link Z`` roughly says that X is a
>  grace period which ends before Z begins.  (In fact it covers more than
>  this, because it also includes cases where some store propagates to
>  Z's CPU before Z begins but doesn't propagate to some other CPU until
> -after X ends.)  Similarly, X ->rcu-rscsi Y ->rcu-link Z says that X is
> +after X ends.)  Similarly, ``X ->rcu-rscsi Y ->rcu-link Z`` says that X is
>  the end of a critical section which starts before Z begins.
>  
>  The LKMM goes on to define the rcu-fence relation as a sequence of
>  rcu-gp and rcu-rscsi links separated by rcu-link links, in which the
>  number of rcu-gp links is >= the number of rcu-rscsi links.  For
> -example:
> +example::
>  
>  	X ->rcu-gp Y ->rcu-link Z ->rcu-rscsi T ->rcu-link U ->rcu-gp V
>  
> -would imply that X ->rcu-fence V, because this sequence contains two
> +would imply that ``X ->rcu-fence V``, because this sequence contains two
>  rcu-gp links and one rcu-rscsi link.  (It also implies that
> -X ->rcu-fence T and Z ->rcu-fence V.)  On the other hand:
> +``X ->rcu-fence T`` and ``Z ->rcu-fence V``.)  On the other hand::
>  
>  	X ->rcu-rscsi Y ->rcu-link Z ->rcu-rscsi T ->rcu-link U ->rcu-gp V
>  
> -does not imply X ->rcu-fence V, because the sequence contains only
> +does not imply ``X ->rcu-fence V``, because the sequence contains only
>  one rcu-gp link but two rcu-rscsi links.
>  
>  The rcu-fence relation is important because the Grace Period Guarantee
>  means that rcu-fence acts kind of like a strong fence.  In particular,
> -E ->rcu-fence F implies not only that E begins before F ends, but also
> +``E ->rcu-fence F`` implies not only that E begins before F ends, but also
>  that any write po-before E will propagate to every CPU before any
>  instruction po-after F can execute.  (However, it does not imply that
>  E must execute before F; in fact, each synchronize_rcu() fence event
>  is linked to itself by rcu-fence as a degenerate case.)
>  
>  To prove this in full generality requires some intellectual effort.
> -We'll consider just a very simple case:
> +We'll consider just a very simple case::
>  
>  	G ->rcu-gp W ->rcu-link Z ->rcu-rscsi F.
>  
> @@ -1595,13 +1604,13 @@ covered by rcu-fence.
>  Finally, the LKMM defines the RCU-before (rb) relation in terms of
>  rcu-fence.  This is done in essentially the same way as the pb
>  relation was defined in terms of strong-fence.  We will omit the
> -details; the end result is that E ->rb F implies E must execute
> -before F, just as E ->pb F does (and for much the same reasons).
> +details; the end result is that ``E ->rb F`` implies E must execute
> +before F, just as ``E ->pb F`` does (and for much the same reasons).
>  
>  Putting this all together, the LKMM expresses the Grace Period
>  Guarantee by requiring that the rb relation does not contain a cycle.
>  Equivalently, this "rcu" axiom requires that there are no events E
> -and F with E ->rcu-link F ->rcu-fence E.  Or to put it a third way,
> +and F with ``E ->rcu-link F ->rcu-fence E``.  Or to put it a third way,
>  the axiom requires that there are no cycles consisting of rcu-gp and
>  rcu-rscsi alternating with rcu-link, where the number of rcu-gp links
>  is >= the number of rcu-rscsi links.
> @@ -1621,7 +1630,7 @@ question, and let S be the synchronize_rcu() fence event for the grace
>  period.  Saying that the critical section starts before S means there
>  are events Q and R where Q is po-after L (which marks the start of the
>  critical section), Q is "before" R in the sense used by the rcu-link
> -relation, and R is po-before the grace period S.  Thus we have:
> +relation, and R is po-before the grace period S.  Thus we have::
>  
>  	L ->rcu-link S.
>  
> @@ -1629,20 +1638,20 @@ Let W be the store mentioned above, let Y come before the end of the
>  critical section and witness that W propagates to the critical
>  section's CPU by reading from W, and let Z on some arbitrary CPU be a
>  witness that W has not propagated to that CPU, where Z happens after
> -some event X which is po-after S.  Symbolically, this amounts to:
> +some event X which is po-after S.  Symbolically, this amounts to::
>  
>  	S ->po X ->hb* Z ->fr W ->rf Y ->po U.
>  
>  The fr link from Z to W indicates that W has not propagated to Z's CPU
>  at the time that Z executes.  From this, it can be shown (see the
>  discussion of the rcu-link relation earlier) that S and U are related
> -by rcu-link:
> +by rcu-link::
>  
>  	S ->rcu-link U.
>  
> -Since S is a grace period we have S ->rcu-gp S, and since L and U are
> -the start and end of the critical section C we have U ->rcu-rscsi L.
> -From this we obtain:
> +Since S is a grace period we have ``S ->rcu-gp S``, and since L and U are
> +the start and end of the critical section C we have ``U ->rcu-rscsi L``.
> +From this we obtain::
>  
>  	S ->rcu-gp S ->rcu-link U ->rcu-rscsi L ->rcu-link S,
>  
> @@ -1651,7 +1660,7 @@ the Grace Period Guarantee.
>  
>  For something a little more down-to-earth, let's see how the axiom
>  works out in practice.  Consider the RCU code example from above, this
> -time with statement labels added:
> +time with statement labels added::
>  
>  	int x, y;
>  
> @@ -1674,20 +1683,20 @@ time with statement labels added:
>  
>  
>  If r2 = 0 at the end then P0's store at Y overwrites the value that
> -P1's load at W reads from, so we have W ->fre Y.  Since S ->po W and
> -also Y ->po U, we get S ->rcu-link U.  In addition, S ->rcu-gp S
> +P1's load at W reads from, so we have ``W ->fre Y``.  Since ``S ->po W`` and
> +also ``Y ->po U``, we get ``S ->rcu-link U``.  In addition, ``S ->rcu-gp S``
>  because S is a grace period.
>  
>  If r1 = 1 at the end then P1's load at Z reads from P0's store at X,
> -so we have X ->rfe Z.  Together with L ->po X and Z ->po S, this
> -yields L ->rcu-link S.  And since L and U are the start and end of a
> -critical section, we have U ->rcu-rscsi L.
> +so we have ``X ->rfe Z``.  Together with ``L ->po X`` and ``Z ->po S``, this
> +yields ``L ->rcu-link S``.  And since L and U are the start and end of a
> +critical section, we have ``U ->rcu-rscsi L``.
>  
> -Then U ->rcu-rscsi L ->rcu-link S ->rcu-gp S ->rcu-link U is a
> +Then ``U ->rcu-rscsi L ->rcu-link S ->rcu-gp S ->rcu-link U`` is a
>  forbidden cycle, violating the "rcu" axiom.  Hence the outcome is not
>  allowed by the LKMM, as we would expect.
>  
> -For contrast, let's see what can happen in a more complicated example:
> +For contrast, let's see what can happen in a more complicated example::
>  
>  	int x, y, z;
>  
> @@ -1720,28 +1729,28 @@ For contrast, let's see what can happen in a more complicated example:
>  		U2: rcu_read_unlock();
>  	}
>  
> -If r0 = r1 = r2 = 1 at the end, then similar reasoning to before shows
> -that U0 ->rcu-rscsi L0 ->rcu-link S1 ->rcu-gp S1 ->rcu-link U2 ->rcu-rscsi
> -L2 ->rcu-link U0.  However this cycle is not forbidden, because the
> +If ``r0 = r1 = r2 = 1`` at the end, then similar reasoning to before shows
> +that ``U0 ->rcu-rscsi L0 ->rcu-link S1 ->rcu-gp S1 ->rcu-link U2 ->rcu-rscsi
> +L2 ->rcu-link U0``.  However this cycle is not forbidden, because the
>  sequence of relations contains fewer instances of rcu-gp (one) than of
>  rcu-rscsi (two).  Consequently the outcome is allowed by the LKMM.
>  The following instruction timing diagram shows how it might actually
> -occur:
> -
> -P0			P1			P2
> ---------------------	--------------------	--------------------
> -rcu_read_lock()
> -WRITE_ONCE(y, 1)
> -			r1 = READ_ONCE(y)
> -			synchronize_rcu() starts
> -			.			rcu_read_lock()
> -			.			WRITE_ONCE(x, 1)
> -r0 = READ_ONCE(x)	.
> -rcu_read_unlock()	.
> -			synchronize_rcu() ends
> -			WRITE_ONCE(z, 1)
> -						r2 = READ_ONCE(z)
> -						rcu_read_unlock()
> +occur::
> +
> +	P0			P1			P2
> +	--------------------	--------------------	--------------------
> +	rcu_read_lock()
> +	WRITE_ONCE(y, 1)
> +				r1 = READ_ONCE(y)
> +				synchronize_rcu() starts
> +				.			rcu_read_lock()
> +				.			WRITE_ONCE(x, 1)
> +	r0 = READ_ONCE(x)	.
> +	rcu_read_unlock()	.
> +				synchronize_rcu() ends
> +				WRITE_ONCE(z, 1)
> +							r2 = READ_ONCE(z)
> +							rcu_read_unlock()
>  
>  This requires P0 and P2 to execute their loads and stores out of
>  program order, but of course they are allowed to do so.  And as you
> @@ -1767,26 +1776,26 @@ The LKMM includes locking.  In fact, there is special code for locking
>  in the formal model, added in order to make tools run faster.
>  However, this special code is intended to be more or less equivalent
>  to concepts we have already covered.  A spinlock_t variable is treated
> -the same as an int, and spin_lock(&s) is treated almost the same as:
> +the same as an int, and spin_lock(&s) is treated almost the same as::
>  
>  	while (cmpxchg_acquire(&s, 0, 1) != 0)
>  		cpu_relax();
>  
>  This waits until s is equal to 0 and then atomically sets it to 1,
>  and the read part of the cmpxchg operation acts as an acquire fence.
> -An alternate way to express the same thing would be:
> +An alternate way to express the same thing would be::
>  
>  	r = xchg_acquire(&s, 1);
>  
>  along with a requirement that at the end, r = 0.  Similarly,
> -spin_trylock(&s) is treated almost the same as:
> +spin_trylock(&s) is treated almost the same as::
>  
>  	return !cmpxchg_acquire(&s, 0, 1);
>  
>  which atomically sets s to 1 if it is currently equal to 0 and returns
>  true if it succeeds (the read part of the cmpxchg operation acts as an
>  acquire fence only if the operation is successful).  spin_unlock(&s)
> -is treated almost the same as:
> +is treated almost the same as::
>  
>  	smp_store_release(&s, 0);
>  
> @@ -1802,7 +1811,7 @@ requires that every instruction po-before the lock-release must
>  execute before any instruction po-after the lock-acquire.  This would
>  naturally hold if the release and acquire operations were on different
>  CPUs, but the LKMM says it holds even when they are on the same CPU.
> -For example:
> +For example::
>  
>  	int x, y;
>  	spinlock_t s;
> @@ -1833,7 +1842,7 @@ MP pattern).
>  
>  This requirement does not apply to ordinary release and acquire
>  fences, only to lock-related operations.  For instance, suppose P0()
> -in the example had been written as:
> +in the example had been written as::
>  
>  	P0()
>  	{
> @@ -1847,7 +1856,7 @@ in the example had been written as:
>  
>  Then the CPU would be allowed to forward the s = 1 value from the
>  smp_store_release() to the smp_load_acquire(), executing the
> -instructions in the following order:
> +instructions in the following order::
>  
>  		r3 = smp_load_acquire(&s);	// Obtains r3 = 1
>  		r2 = READ_ONCE(y);
> @@ -1859,7 +1868,7 @@ and thus it could load y before x, obtaining r2 = 0 and r1 = 1.
>  Second, when a lock-acquire reads from a lock-release, and some other
>  stores W and W' occur po-before the lock-release and po-after the
>  lock-acquire respectively, the LKMM requires that W must propagate to
> -each CPU before W' does.  For example, consider:
> +each CPU before W' does.  For example, consider::
>  
>  	int x, y;
>  	spinlock_t x;
> @@ -1928,7 +1937,7 @@ smp_store_release().)  The final effect is the same.
>  Although we didn't mention it above, the instruction execution
>  ordering provided by the smp_rmb() fence doesn't apply to read events
>  that are part of a non-value-returning atomic update.  For instance,
> -given:
> +given::
>  
>  	atomic_inc(&x);
>  	smp_rmb();
> @@ -1967,14 +1976,14 @@ they behave as follows:
>  	events.
>  
>  Interestingly, RCU and locking each introduce the possibility of
> -deadlock.  When faced with code sequences such as:
> +deadlock.  When faced with code sequences such as::
>  
>  	spin_lock(&s);
>  	spin_lock(&s);
>  	spin_unlock(&s);
>  	spin_unlock(&s);
>  
> -or:
> +or::
>  
>  	rcu_read_lock();
>  	synchronize_rcu();
> @@ -1984,7 +1993,7 @@ what does the LKMM have to say?  Answer: It says there are no allowed
>  executions at all, which makes sense.  But this can also lead to
>  misleading results, because if a piece of code has multiple possible
>  executions, some of which deadlock, the model will report only on the
> -non-deadlocking executions.  For example:
> +non-deadlocking executions.  For example::
>  
>  	int x, y;
>  
> diff --git a/tools/memory-model/Documentation/index.rst b/tools/memory-model/Documentation/index.rst
> new file mode 100644
> index 000000000000..0e53d83a5a48
> --- /dev/null
> +++ b/tools/memory-model/Documentation/index.rst
> @@ -0,0 +1,20 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +========================
> +Linux Memory Model Tools
> +========================
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   explanation
> +   recipes
> +   references
> +   cheatsheet
> +
> +.. only::  subproject and html
> +
> +   Indices
> +   =======
> +
> +   * :ref:`genindex`
> diff --git a/tools/memory-model/Documentation/recipes.txt b/tools/memory-model/Documentation/recipes.rst
> similarity index 96%
> rename from tools/memory-model/Documentation/recipes.txt
> rename to tools/memory-model/Documentation/recipes.rst
> index 7fe8d7aa3029..0229a431b1a2 100644
> --- a/tools/memory-model/Documentation/recipes.txt
> +++ b/tools/memory-model/Documentation/recipes.rst
> @@ -1,3 +1,8 @@
> +=======
> +Recipes
> +=======
> +
> +
>  This document provides "recipes", that is, litmus tests for commonly
>  occurring situations, as well as a few that illustrate subtly broken but
>  attractive nuisances.  Many of these recipes include example code from
> @@ -67,7 +72,7 @@ has acquired a given lock sees any changes previously seen or made by any
>  CPU before it released that same lock.  Note that this statement is a bit
>  stronger than "Any CPU holding a given lock sees all changes made by any
>  CPU during the time that CPU was holding this same lock".  For example,
> -consider the following pair of code fragments:
> +consider the following pair of code fragments::
>  
>  	/* See MP+polocks.litmus. */
>  	void CPU0(void)
> @@ -93,7 +98,7 @@ value of r1 must also be equal to 1.  In contrast, the weaker rule would
>  say nothing about the final value of r1.
>  
>  The converse to the basic rule also holds, as illustrated by the
> -following litmus test:
> +following litmus test::
>  
>  	/* See MP+porevlocks.litmus. */
>  	void CPU0(void)
> @@ -124,7 +129,7 @@ across multiple CPUs.
>  
>  However, it is not necessarily the case that accesses ordered by
>  locking will be seen as ordered by CPUs not holding that lock.
> -Consider this example:
> +Consider this example::
>  
>  	/* See Z6.0+pooncerelease+poacquirerelease+fencembonceonce.litmus. */
>  	void CPU0(void)
> @@ -157,7 +162,7 @@ CPU2() never acquired the lock, and thus did not benefit from the
>  lock's ordering properties.
>  
>  Ordering can be extended to CPUs not holding the lock by careful use
> -of smp_mb__after_spinlock():
> +of smp_mb__after_spinlock()::
>  
>  	/* See Z6.0+pooncelock+poonceLock+pombonce.litmus. */
>  	void CPU0(void)
> @@ -214,7 +219,7 @@ Release and acquire
>  ~~~~~~~~~~~~~~~~~~~
>  
>  Use of smp_store_release() and smp_load_acquire() is one way to force
> -the desired MP ordering.  The general approach is shown below:
> +the desired MP ordering.  The general approach is shown below::
>  
>  	/* See MP+pooncerelease+poacquireonce.litmus. */
>  	void CPU0(void)
> @@ -245,7 +250,7 @@ Assign and dereference
>  Use of rcu_assign_pointer() and rcu_dereference() is quite similar to the
>  use of smp_store_release() and smp_load_acquire(), except that both
>  rcu_assign_pointer() and rcu_dereference() operate on RCU-protected
> -pointers.  The general approach is shown below:
> +pointers.  The general approach is shown below::
>  
>  	/* See MP+onceassign+derefonce.litmus. */
>  	int z;
> @@ -290,7 +295,7 @@ Write and read memory barriers
>  It is usually better to use smp_store_release() instead of smp_wmb()
>  and to use smp_load_acquire() instead of smp_rmb().  However, the older
>  smp_wmb() and smp_rmb() APIs are still heavily used, so it is important
> -to understand their use cases.  The general approach is shown below:
> +to understand their use cases.  The general approach is shown below::
>  
>  	/* See MP+fencewmbonceonce+fencermbonceonce.litmus. */
>  	void CPU0(void)
> @@ -312,7 +317,7 @@ smp_rmb() macro orders prior loads against later loads.  Therefore, if
>  the final value of r0 is 1, the final value of r1 must also be 1.
>  
>  The xlog_state_switch_iclogs() function in fs/xfs/xfs_log.c contains
> -the following write-side code fragment:
> +the following write-side code fragment::
>  
>  	log->l_curr_block -= log->l_logBBsize;
>  	ASSERT(log->l_curr_block >= 0);
> @@ -327,7 +332,7 @@ the corresponding read-side code fragment:
>  	cur_block = READ_ONCE(log->l_curr_block);
>  
>  Alternatively, consider the following comment in function
> -perf_output_put_handle() in kernel/events/ring_buffer.c:
> +perf_output_put_handle() in kernel/events/ring_buffer.c::
>  
>  	 *   kernel				user
>  	 *
> @@ -358,7 +363,7 @@ absence of any ordering it is quite possible that this may happen, as
>  can be seen in the LB+poonceonces.litmus litmus test.
>  
>  One way of avoiding the counter-intuitive outcome is through the use of a
> -control dependency paired with a full memory barrier:
> +control dependency paired with a full memory barrier::
>  
>  	/* See LB+fencembonceonce+ctrlonceonce.litmus. */
>  	void CPU0(void)
> @@ -382,7 +387,7 @@ The A/D pairing from the ring-buffer use case shown earlier also
>  illustrates LB.  Here is a repeat of the comment in
>  perf_output_put_handle() in kernel/events/ring_buffer.c, showing a
>  control dependency on the kernel side and a full memory barrier on
> -the user side:
> +the user side::
>  
>  	 *   kernel				user
>  	 *
> @@ -407,7 +412,7 @@ Release-acquire chains
>  
>  Release-acquire chains are a low-overhead, flexible, and easy-to-use
>  method of maintaining order.  However, they do have some limitations that
> -need to be fully understood.  Here is an example that maintains order:
> +need to be fully understood.  Here is an example that maintains order::
>  
>  	/* See ISA2+pooncerelease+poacquirerelease+poacquireonce.litmus. */
>  	void CPU0(void)
> @@ -436,7 +441,7 @@ example, ordering would still be preserved if CPU1()'s smp_load_acquire()
>  invocation was replaced with READ_ONCE().
>  
>  It is tempting to assume that CPU0()'s store to x is globally ordered
> -before CPU1()'s store to z, but this is not the case:
> +before CPU1()'s store to z, but this is not the case::
>  
>  	/* See Z6.0+pooncerelease+poacquirerelease+mbonceonce.litmus. */
>  	void CPU0(void)
> @@ -474,7 +479,7 @@ Store buffering
>  Store buffering can be thought of as upside-down load buffering, so
>  that one CPU first stores to one variable and then loads from a second,
>  while another CPU stores to the second variable and then loads from the
> -first.  Preserving order requires nothing less than full barriers:
> +first.  Preserving order requires nothing less than full barriers::
>  
>  	/* See SB+fencembonceonces.litmus. */
>  	void CPU0(void)
> @@ -498,7 +503,7 @@ this counter-intuitive outcome.
>  This pattern most famously appears as part of Dekker's locking
>  algorithm, but it has a much more practical use within the Linux kernel
>  of ordering wakeups.  The following comment taken from waitqueue_active()
> -in include/linux/wait.h shows the canonical pattern:
> +in include/linux/wait.h shows the canonical pattern::
>  
>   *      CPU0 - waker                    CPU1 - waiter
>   *
> @@ -550,16 +555,16 @@ The strength of memory ordering required for a given litmus test to
>  avoid a counter-intuitive outcome depends on the types of relations
>  linking the memory accesses for the outcome in question:
>  
> -o	If all links are write-to-read links, then the weakest
> +-	If all links are write-to-read links, then the weakest
>  	possible ordering within each CPU suffices.  For example, in
>  	the LB litmus test, a control dependency was enough to do the
>  	job.
>  
> -o	If all but one of the links are write-to-read links, then a
> +-	If all but one of the links are write-to-read links, then a
>  	release-acquire chain suffices.  Both the MP and the ISA2
>  	litmus tests illustrate this case.
>  
> -o	If more than one of the links are something other than
> +-	If more than one of the links are something other than
>  	write-to-read links, then a full memory barrier is required
>  	between each successive pair of non-write-to-read links.  This
>  	case is illustrated by the Z6.0 litmus tests, both in the
> diff --git a/tools/memory-model/Documentation/references.txt b/tools/memory-model/Documentation/references.rst
> similarity index 71%
> rename from tools/memory-model/Documentation/references.txt
> rename to tools/memory-model/Documentation/references.rst
> index b177f3e4a614..275876cd10b8 100644
> --- a/tools/memory-model/Documentation/references.txt
> +++ b/tools/memory-model/Documentation/references.rst
> @@ -1,3 +1,7 @@
> +==========
> +References
> +==========
> +
>  This document provides background reading for memory models and related
>  tools.  These documents are aimed at kernel hackers who are interested
>  in memory models.
> @@ -6,64 +10,64 @@ in memory models.
>  Hardware manuals and models
>  ===========================
>  
> -o	SPARC International Inc. (Ed.). 1994. "The SPARC Architecture
> +-	SPARC International Inc. (Ed.). 1994. "The SPARC Architecture
>  	Reference Manual Version 9". SPARC International Inc.
>  
> -o	Compaq Computer Corporation (Ed.). 2002. "Alpha Architecture
> +-	Compaq Computer Corporation (Ed.). 2002. "Alpha Architecture
>  	Reference Manual".  Compaq Computer Corporation.
>  
> -o	Intel Corporation (Ed.). 2002. "A Formal Specification of Intel
> +-	Intel Corporation (Ed.). 2002. "A Formal Specification of Intel
>  	Itanium Processor Family Memory Ordering". Intel Corporation.
>  
> -o	Intel Corporation (Ed.). 2002. "Intel 64 and IA-32 Architectures
> +-	Intel Corporation (Ed.). 2002. "Intel 64 and IA-32 Architectures
>  	Software Developer’s Manual". Intel Corporation.
>  
> -o	Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli,
> +-	Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli,
>  	and Magnus O. Myreen. 2010. "x86-TSO: A Rigorous and Usable
>  	Programmer's Model for x86 Multiprocessors". Commun. ACM 53, 7
>  	(July, 2010), 89-97. http://doi.acm.org/10.1145/1785414.1785443
>  
> -o	IBM Corporation (Ed.). 2009. "Power ISA Version 2.06". IBM
> +-	IBM Corporation (Ed.). 2009. "Power ISA Version 2.06". IBM
>  	Corporation.
>  
> -o	ARM Ltd. (Ed.). 2009. "ARM Barrier Litmus Tests and Cookbook".
> +-	ARM Ltd. (Ed.). 2009. "ARM Barrier Litmus Tests and Cookbook".
>  	ARM Ltd.
>  
> -o	Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and
> +-	Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and
>  	Derek Williams.  2011. "Understanding POWER Multiprocessors". In
>  	Proceedings of the 32Nd ACM SIGPLAN Conference on Programming
>  	Language Design and Implementation (PLDI ’11). ACM, New York,
>  	NY, USA, 175–186.
>  
> -o	Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty,
> +-	Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty,
>  	Peter Sewell, Luc Maranget, Jade Alglave, and Derek Williams.
>  	2012. "Synchronising C/C++ and POWER". In Proceedings of the 33rd
>  	ACM SIGPLAN Conference on Programming Language Design and
>  	Implementation (PLDI '12). ACM, New York, NY, USA, 311-322.
>  
> -o	ARM Ltd. (Ed.). 2014. "ARM Architecture Reference Manual (ARMv8,
> +-	ARM Ltd. (Ed.). 2014. "ARM Architecture Reference Manual (ARMv8,
>  	for ARMv8-A architecture profile)". ARM Ltd.
>  
> -o	Imagination Technologies, LTD. 2015. "MIPS(R) Architecture
> +-	Imagination Technologies, LTD. 2015. "MIPS(R) Architecture
>  	For Programmers, Volume II-A: The MIPS64(R) Instruction,
>  	Set Reference Manual". Imagination Technologies,
>  	LTD. https://imgtec.com/?do-download=4302.
>  
> -o	Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit
> +-	Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit
>  	Sarkar, Ali Sezgin, Luc Maranget, Will Deacon, and Peter
>  	Sewell. 2016. "Modelling the ARMv8 Architecture, Operationally:
>  	Concurrency and ISA". In Proceedings of the 43rd Annual ACM
>  	SIGPLAN-SIGACT Symposium on Principles of Programming Languages
>  	(POPL ’16). ACM, New York, NY, USA, 608–621.
>  
> -o	Shaked Flur, Susmit Sarkar, Christopher Pulte, Kyndylan Nienhuis,
> +-	Shaked Flur, Susmit Sarkar, Christopher Pulte, Kyndylan Nienhuis,
>  	Luc Maranget, Kathryn E. Gray, Ali Sezgin, Mark Batty, and Peter
>  	Sewell. 2017. "Mixed-size Concurrency: ARM, POWER, C/C++11,
>  	and SC". In Proceedings of the 44th ACM SIGPLAN Symposium on
>  	Principles of Programming Languages (POPL 2017). ACM, New York,
>  	NY, USA, 429–442.
>  
> -o	Christopher Pulte, Shaked Flur, Will Deacon, Jon French,
> +-	Christopher Pulte, Shaked Flur, Will Deacon, Jon French,
>  	Susmit Sarkar, and Peter Sewell. 2018. "Simplifying ARM concurrency:
>  	multicopy-atomic axiomatic and operational models for ARMv8". In
>  	Proceedings of the ACM on Programming Languages, Volume 2, Issue
> @@ -73,18 +77,18 @@ o	Christopher Pulte, Shaked Flur, Will Deacon, Jon French,
>  Linux-kernel memory model
>  =========================
>  
> -o	Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
> +-	Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
>  	Alan Stern.  2018. "Frightening small children and disconcerting
>  	grown-ups: Concurrency in the Linux kernel". In Proceedings of
>  	the 23rd International Conference on Architectural Support for
>  	Programming Languages and Operating Systems (ASPLOS 2018). ACM,
>  	New York, NY, USA, 405-418.  Webpage: http://diy.inria.fr/linux/.
>  
> -o	Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
> +-	Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
>  	Alan Stern.  2017.  "A formal kernel memory-ordering model (part 1)"
>  	Linux Weekly News.  https://lwn.net/Articles/718628/
>  
> -o	Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
> +-	Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
>  	Alan Stern.  2017.  "A formal kernel memory-ordering model (part 2)"
>  	Linux Weekly News.  https://lwn.net/Articles/720550/
>  
> @@ -92,16 +96,16 @@ o	Jade Alglave, Luc Maranget, Paul E. McKenney, Andrea Parri, and
>  Memory-model tooling
>  ====================
>  
> -o	Daniel Jackson. 2002. "Alloy: A Lightweight Object Modelling
> +-	Daniel Jackson. 2002. "Alloy: A Lightweight Object Modelling
>  	Notation". ACM Trans. Softw. Eng. Methodol. 11, 2 (April 2002),
>  	256–290. http://doi.acm.org/10.1145/505145.505149
>  
> -o	Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. "Herding
> +-	Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. "Herding
>  	Cats: Modelling, Simulation, Testing, and Data Mining for Weak
>  	Memory". ACM Trans. Program. Lang. Syst. 36, 2, Article 7 (July
>  	2014), 7:1–7:74 pages.
>  
> -o	Jade Alglave, Patrick Cousot, and Luc Maranget. 2016. "Syntax and
> +-	Jade Alglave, Patrick Cousot, and Luc Maranget. 2016. "Syntax and
>  	semantics of the weak consistency model specification language
>  	cat". CoRR abs/1608.07531 (2016). http://arxiv.org/abs/1608.07531
>  
> @@ -109,6 +113,6 @@ o	Jade Alglave, Patrick Cousot, and Luc Maranget. 2016. "Syntax and
>  Memory-model comparisons
>  ========================
>  
> -o	Paul E. McKenney, Ulrich Weigand, Andrea Parri, and Boqun
> +-	Paul E. McKenney, Ulrich Weigand, Andrea Parri, and Boqun
>  	Feng. 2016. "Linux-Kernel Memory Model". (6 June 2016).
>  	http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0124r2.html.
> diff --git a/tools/memory-model/README b/tools/memory-model/README
> index 2b87f3971548..04bb1fa9ed76 100644
> --- a/tools/memory-model/README
> +++ b/tools/memory-model/README
> @@ -105,16 +105,16 @@ for more information.
>  DESCRIPTION OF FILES
>  ====================
>  
> -Documentation/cheatsheet.txt
> +Documentation/cheatsheet.rst
>  	Quick-reference guide to the Linux-kernel memory model.
>  
> -Documentation/explanation.txt
> +Documentation/explanation.rst
>  	Describes the memory model in detail.
>  
> -Documentation/recipes.txt
> +Documentation/recipes.rst
>  	Lists common memory-ordering patterns.
>  
> -Documentation/references.txt
> +Documentation/references.rst
>  	Provides background reading.
>  
>  linux-kernel.bell
> @@ -173,7 +173,7 @@ The Linux-kernel memory model has the following limitations:
>  	of READ_ONCE() and WRITE_ONCE() limits the compiler's ability
>  	to optimize, but there is Linux-kernel code that uses bare C
>  	memory accesses.  Handling this code is on the to-do list.
> -	For more information, see Documentation/explanation.txt (in
> +	For more information, see Documentation/explanation.rst (in
>  	particular, the "THE PROGRAM ORDER RELATION: po AND po-loc"
>  	and "A WARNING" sections).
>  
> 
>