On 2017年02月18日 01:32, Paul E. McKenney wrote: > On Sat, Feb 18, 2017 at 12:22:01AM +0800, Yubin Ruan wrote: >> On 2017年02月17日 23:35, Paul E. McKenney wrote: >>> On Fri, Feb 17, 2017 at 05:20:30PM +0800, Yubin Ruan wrote: >>>> >>>> >>>> On 2017年02月17日 16:45, Yubin Ruan wrote: >>>>> >>>>> >>>>> On 2017年02月17日 02:58, Paul E. McKenney wrote: >>>>>> On Tue, Feb 14, 2017 at 06:35:05PM +0800, Yubin Ruan wrote: >>>>>>> On 2017/2/14 3:06, Paul E. McKenney wrote: >>>>>>>> On Mon, Feb 13, 2017 at 09:55:50PM +0800, Yubin Ruan wrote: >>>>>>>>> It have been mentioned in the book that there are three kinds of >>>>>>>>> memory barriers: smp_rmb, smp_wmb, smp_mb >>>>>>>>> >>>>>>>>> I am confused about their actual semantic: >>>>>>>>> >>>>>>>>> The book says that(B.5 paragraph 2, perfbook2017.01.02a): >>>>>>>>> >>>>>>>>> for smp_rmb(): >>>>>>>>> "The effect of this is that a read memory barrier orders >>>>>>>>> only loads on the CPU that executes it, so that all loads >>>>>>>>> preceding the read memory barrier will appear to have >>>>>>>>> completed before any load following the read memory >>>>>>>>> barrier" >>>>>>>>> >>>>>>>>> for smp_wmb(): >>>>>>>>> "so that all stores preceding the write memory barrier will >>>>>>>>> appear to have completed before any store following the >>>>>>>>> write memory barrier" >>>>>>>>> >>>>>>>>> I wonder, is there any primitive "X" which can guarantees: >>>>>>>>> "that all 'loads' preceding the X will appear to have completed >>>>>>>>> before any *store* following the X " >>>>>>>>> >>>>>>>>> and similarly: >>>>>>>>> "that all 'store' preceding the X will appear to have completed >>>>>>>>> before any *load* following the X " >>>>>>> >>>>>>> I am reading your the material you provided. >>>>>>> So, there is no short answer(yes/no) to the questions above?(I mean >>>>>>> the primitive X) >>>>>> >>>>>> For smp_mb(), the full memory barrier, things are pretty simple. >>>>>> All CPUs will agree that all accesses by any CPU preceding a given >>>>>> smp_mb() happened before any accesses by that same CPU following that >>>>>> same smp_mb(). Full memory barriers are also transitive, so that you >>>>>> can reason (relatively) easily about situations involving many CPUs. >>>> >>>> One more thing about the full memory barrier. You say *all CPU >>>> agree*. It does not include Alpha, right? >>> >>> It does include Alpha. Remember that Alpha's peculiarities occur when >>> you -don't- have full memory barriers. If you have a full memory barrier >>> between each pair of accesses, then everything will be ordered on pretty >>> much every type of CPU. >>> >> >> You mean this change would work for Alpha? >> >>> 1 struct el *insert(long key, long data) >>> 2 { >>> 3 struct el *p; >>> 4 p = kmalloc(sizeof(*p), GFP_ATOMIC); >>> 5 spin_lock(&mutex); >>> 6 p->next = head.next; >>> 7 p->key = key; >>> 8 p->data = data; >> >>> 9 smp_mb(); /* changed `smp_wmb()' to `smp_mb()' */ > > No, this would not help. > >>> 10 head.next = p; >>> 11 spin_unlock(&mutex); >>> 12 } >>> 13 >>> 14 struct el *search(long key) >>> 15 { >>> 16 struct el *p; >>> 17 p = head.next; >>> 18 while (p != &head) { >>> 19 /* BUG ON ALPHA!!! */ > > smp_mb(); > > This is where you need the additional barrier. Note that in the Linux > kernel, rcu_dereference() and similar primitives provide this barrier > in Alpha builds. > > Thanx, Paul > Got it. So, regarding to memory barrier, I think I was confused with "how one CPU deal with the the memory barriers of another CPU's memory". As you have said, for any CPU, "all accesses by any CPU preceding a given smp_mb() happened before any accesses by that same CPU following that same smp_mb()", and all CPU "agree" with this. But that doesn't mean the other CPUs will regard this access sequence(e.g, Alpha). Right ? sorry for my annoying obsession with this. Thanks. regards, Yubin Ruan >>> 20 if (p->key == key) { >>> 21 return (p); >>> 22 } >>> 23 p = p->next; >>> 24 }; >>> 25 return (NULL); >>> 26 } >> >> regards, >> Yubin Ruan >> >>> The one exception that I am aware of is Itanium, which also requires >>> that the stores be converted to store-release instructions. >>> >>> Thanx, Paul >>> >>>> regards, >>>> Yubin Ruan >>>> >>>>>> For smp_rmb() and smp_wmb(), not so much. The canonical example showing >>>>>> the complexity of smp_wmb() is called "R": >>>>>> >>>>>> Thread 0 Thread 1 >>>>>> -------- -------- >>>>>> WRITE_ONCE(x, 1); WRITE_ONCE(y, 2); >>>>>> smp_wmb(); smp_mb(); >>>>>> WRITE_ONCE(y, 1); r1 = READ_ONCE(x); >>>>>> >>>>>> One might hope that if the final value of y is 2, then the value of >>>>>> r1 must be 1. People hoping this would be disappointed, because >>>>>> there really is hardware that will allow the outcome y == 1 && r1 == 0. >>>>>> >>>>>> See the following URL for many more examples of this sort of thing: >>>>>> >>>>>> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test6.pdf >>>>>> >>>>>> For more information, including some explanation of the nomenclature, >>>>>> see: >>>>>> >>>>>> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf >>>>>> >>>>>> There are formal memory models that account for this, and in fact this >>>>>> appendix is slated to be rewritten based on some work a group of us have >>>>>> been doing over the past two years or so. A tarball containing a draft >>>>>> of this work is attached. I suggested starting with index.html. If >>>>>> you get a chance to look it over, I would value any suggestions that >>>>>> you might have. >>>>>> >>>>> >>>>> Thanks for your reply. I will take some time to read those materials. >>>>> Discussions with you really help eliminate some of my doubts. Hopefully >>>>> we can have more discussions in the future. >>>>> >>>>> regards, >>>>> Yubin Ruan >>>>> >>>>>>>>> I know I can use the general smp_mb() for that, but that is a little >>>>>>>>> too general. >>>>>>>>> >>>>>>>>> Do I miss/mix anything ? >>>>>>>> >>>>>>>> Well, the memory-ordering material is a bit dated. There is some work >>>>>>>> underway to come up with a better model, and I presented on it a couple >>>>>>>> weeks ago: >>>>>>>> >>>>>>>> http://www.rdrop.com/users/paulmck/scalability/paper/LinuxMM.2017.01.19a.LCA.pdf >>>>>>>> >>>>>>>> >>>>>>>> This presentation calls out a tarball that includes some .html files >>>>>>>> that have much better explanations, and this wording will hopefully >>>>>>>> be reflected in an upcoming version of the book. Here is a direct >>>>>>>> URL for the tarball: >>>>>>>> >>>>>>>> http://www.rdrop.com/users/paulmck/scalability/paper/LCA-LinuxMemoryModel.2017.01.15a.tgz >>>>>>>> >>>>>>>> >>>>>>>> Thanx, Paul >>>>>>>> >>>>>>> >>>>>>> regrads, >>>>>>> Yubin Ruan >>>>>>> >>>> >>> >> > -- To unsubscribe from this list: send the line "unsubscribe perfbook" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html