On Fri, Sep 10, 2021 at 12:36:32PM -0400, Alan Stern wrote: > On Fri, Sep 10, 2021 at 10:20:13PM +0800, Boqun Feng wrote: > > On Thu, Sep 09, 2021 at 11:00:05AM -0700, Paul E. McKenney wrote: > > [...] > > > > > > Boqun, I vaguely remember a suggested change from you along these lines, > > > but now I cannot find it. Could you please send it as a formal patch > > > if you have not already done so or point me at it if you have? > > > > > > > Here is a draft patch based on the change I did when I discussed with > > Peter, and I really want to hear Alan's thought first. Ideally, we > > should also have related litmus tests and send to linux-arch list so > > that we know the ordering is provided by every architecture. > > > > Regards, > > Boqun > > > > --------------------------------->8 > > Subject: [PATCH] tools/memory-model: Provide extra ordering for > > lock-{release,acquire} on the same CPU > > > > A recent discussion[1] shows that we are in favor of strengthening the > > ordering of lock-release + lock-acquire on the same CPU: a lock-release > > and a po-after lock-acquire should provide the so-called RCtso ordering, > > that is a memory access S po-before the lock-release should be ordered > > against a memory access R po-after the lock-acquire, unless S is a store > > and R is a load. > > > > The strengthening meets programmers' expection that "sequence of two > > locked regions to be ordered wrt each other" (from Linus), and can > > reduce the mental burden when using locks. Therefore add it in LKMM. > > > > [1]: https://lore.kernel.org/lkml/20210909185937.GA12379@xxxxxxxxxxxxxxxxxxx/ > > > > Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx> > > --- > > The change to linux-kernel.cat looks fine. However, I didn't like your > update to explanation.txt. Instead I wrote my own, given below. > Thanks. Indeed your changes of explanation is better. > I also wrote a couple of litmus tests which Paul can add to the > appropriate archive. They are attached to this email. As expected, > they fail (result Sometimes) with the current LKMM and succeed (Never) > with Boqun's updated model. > Appreciate it, I will put together your change to explanation.txt (with the typo fixed), my change to cat file and the litmus tests, and send a proper patch next Monday. Regards, Boqun > Alan > > > --- usb-devel.orig/tools/memory-model/Documentation/explanation.txt > +++ usb-devel/tools/memory-model/Documentation/explanation.txt > @@ -1813,15 +1813,16 @@ spin_trylock() -- we can call these thin > lock-acquires -- have two properties beyond those of ordinary releases > and acquires. > > -First, when a lock-acquire reads from a lock-release, the LKMM > -requires that every instruction po-before the lock-release must > -execute before any instruction po-after the lock-acquire. This would > -naturally hold if the release and acquire operations were on different > -CPUs, but the LKMM says it holds even when they are on the same CPU. > -For example: > +First, when a lock-acquire reads from or is po-after a lock-release, > +the LKMM requires that every instruction po-before the lock-release > +must execute before any instruction po-after the lock-acquire. This > +would naturally hold if the release and acquire operations were on > +different CPUs and accessed the same lock variable, but the LKMM says > +it also holds when they are on the same CPU, even if they access > +different lock variables. For example: > > int x, y; > - spinlock_t s; > + spinlock_t s, t; > > P0() > { > @@ -1830,9 +1831,9 @@ For example: > spin_lock(&s); > r1 = READ_ONCE(x); > spin_unlock(&s); > - spin_lock(&s); > + spin_lock(&t); > r2 = READ_ONCE(y); > - spin_unlock(&s); > + spin_unlock(&t); > } > > P1() > @@ -1842,10 +1843,10 @@ For example: > WRITE_ONCE(x, 1); > } > > -Here the second spin_lock() reads from the first spin_unlock(), and > -therefore the load of x must execute before the load of y. Thus we > -cannot have r1 = 1 and r2 = 0 at the end (this is an instance of the > -MP pattern). > +Here the second spin_lock() is po-after the first spin_unlock(), and > +therefore the load of x must execute before the load of y, even tbough > +the two locking operations use different locks. Thus we cannot have > +r1 = 1 and r2 = 0 at the end (this is an instance of the MP pattern). > > This requirement does not apply to ordinary release and acquire > fences, only to lock-related operations. For instance, suppose P0() > @@ -1872,13 +1873,13 @@ instructions in the following order: > > and thus it could load y before x, obtaining r2 = 0 and r1 = 1. > > -Second, when a lock-acquire reads from a lock-release, and some other > -stores W and W' occur po-before the lock-release and po-after the > -lock-acquire respectively, the LKMM requires that W must propagate to > -each CPU before W' does. For example, consider: > +Second, when a lock-acquire reads from or is po-after a lock-release, > +and some other stores W and W' occur po-before the lock-release and > +po-after the lock-acquire respectively, the LKMM requires that W must > +propagate to each CPU before W' does. For example, consider: > > int x, y; > - spinlock_t x; > + spinlock_t s; > > P0() > { > @@ -1908,7 +1909,12 @@ each CPU before W' does. For example, c > > If r1 = 1 at the end then the spin_lock() in P1 must have read from > the spin_unlock() in P0. Hence the store to x must propagate to P2 > -before the store to y does, so we cannot have r2 = 1 and r3 = 0. > +before the store to y does, so we cannot have r2 = 1 and r3 = 0. But > +if P1 had used a lock variable different from s, the writes could have > +propagated in either order. (On the other hand, if the code in P0 and > +P1 had all executed on a single CPU, as in the example before this > +one, then the writes would have propagated in order even if the two > +critical sections used different lock variables.) > > These two special requirements for lock-release and lock-acquire do > not arise from the operational model. Nevertheless, kernel developers > > C ullk-rw > > (* > * Result: Never > * > * If two locked critical sections execute on the same CPU, all accesses > * in the first must execute before any accesses in the second, even if > * the critical sections are protected by different locks. > *) > > {} > > P0(spinlock_t *s, spinlock_t *t, int *x, int *y) > { > int r1; > > spin_lock(s); > r1 = READ_ONCE(*x); > spin_unlock(s); > spin_lock(t); > WRITE_ONCE(*y, 1); > spin_unlock(t); > } > > P1(int *x, int *y) > { > int r2; > > r2 = smp_load_acquire(y); > WRITE_ONCE(*x, 1); > } > > exists (0:r1=1 /\ 1:r2=1) > C ullk-ww > > (* > * Result: Never > * > * If two locked critical sections execute on the same CPU, stores in the > * first must propagate to each CPU before stores in the second do, even if > * the critical sections are protected by different locks. > *) > > {} > > P0(spinlock_t *s, spinlock_t *t, int *x, int *y) > { > spin_lock(s); > WRITE_ONCE(*x, 1); > spin_unlock(s); > spin_lock(t); > WRITE_ONCE(*y, 1); > spin_unlock(t); > } > > P1(int *x, int *y) > { > int r1; > int r2; > > r1 = READ_ONCE(*y); > smp_rmb(); > r2 = READ_ONCE(*x); > } > > exists (1:r1=1 /\ 1:r2=0)