Re: Question: Why does the modification of seq in write_seqlock not require WRITE_ONCE?

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Tue, 25 Apr 2023 10:50:30 -0700

On Wed, Apr 26, 2023 at 01:06:57AM +0800, Alan Huang wrote:
> Hi Akira,
> 
> > 2023年4月25日 23:33，Akira Yokosawa <akiyks@xxxxxxxxx> 写道：
> > 
> > Hi Alan,
> > 
> > On Tue, 25 Apr 2023 21:52:48 +0800, Alan Huang wrote:
> >> Hi,
> >> 
> >> I noticed that the modifications of seq in write_seqlock and write_sequnlock of Listing 9.10 use plain ++ operation.
> >> But as Chapter 4 (especially 4.3.4.4) says, there will be store tearing since there are concurrent readers.
> >> 
> >> However, the kernel implementation of sequence lock also uses plain ++ operation.
> >> 
> >> I’m somewhat confused.
> >> 
> >> Why does the modification of seq in write_seqlock not require WRITE_ONCE?
> > 
> > Good question...
> > 
> > I don't have a straight answer, but the history of include/linux/seqlock.h
> > says those plain increments predate the introduction of ACCESS_ONCE
> > (predecessor to READ_ONCE/WRITE_ONCE) to the Linux kernel.
> > 
> > The code at the time (pre v2.6.0) looked like this:
> > 
> > static inline void write_seqlock(seqlock_t *sl)
> > {
> > spin_lock(&sl->lock);
> > ++sl->sequence;
> > smp_wmb(); 
> > } 
> > 
> > static inline void write_sequnlock(seqlock_t *sl) 
> > {
> > smp_wmb();
> > sl->sequence++;
> > spin_unlock(&sl->lock);
> > }
> > 
> > static inline int write_tryseqlock(seqlock_t *sl)
> > {
> > int ret = spin_trylock(&sl->lock);
> > 
> > if (ret) {
> > ++sl->sequence;
> > smp_wmb(); 
> > }
> > return ret;
> > }
> > 
> > The purpose of WRITE_ONCE() would be to suppress compiler optimization
> > of write accesses.  In the code above, smp_wmb(), spin_lock(),
> > spin_unlock(), and spin_trylock() all imply compiler barriers.
> > So, there is not much room for compilers to optimize the store part
> > of increments.
> 
> But unsigned long is at least 32 bits, here is the COMPILER BARRIER section of Documentation/memory-barriers.txt:
> 
> 	For example, given an architecture having 16-bit store instructions with 7-bit immediate fields, the compiler
> 	might be tempted to use two 16-bit store-immediate instructions to implement the following 32-bit store:
> 		p = 0x00010002;
> 
> The kernel defines sequence as unsigned, which is at least 16 bits, I don’t know if there exists an architecture having 16-bit store instructions with 32-bit unsigned…

There have been CPUs and compilers that would use a pair of 16-bit
store-immediate instructions to store a 32-bit constant.  I don't know
of any situation where a compiler would use 16-bit loads or stores for
a 32-bit increment, but there is no law forbidding the compiler from
doing so.

To answer the higher-level question, once we had the Linux-kernel memory
model in place, I focused more on explaining it and clearly not enough
on bringing the perfook code samples into line with it.

So I would welcome a patch making sequence locking use READ_ONCE()
and WRITE_ONCE() to do the increments.  The Linux kernel might not
be so excited about the corresponding patch because it could result
in slightly worse code being generated, for example, a++ might use
an x86 add-to-memory instruction while the READ_ONCE()/WRITE_ONCE()
counterpart would load, increment, and store.  Though I hear rumors that
some compilers might start better optimizing this.

							Thanx, Paul

> Thanks,
> Alan
> 
> > 
> > I think those plain-looking increments are safe as far as current
> > compilers are concerned.
> > 
> > Does this explanation help you?
> > 
> > Paul, please chime in if I'm missing something.
> > 
> >        Thanks, Akira
> > 
> >> 
> >> Thanks,
> >> Alan
>