On Fri, Feb 17, 2017 at 05:20:30PM +0800, Yubin Ruan wrote: > > > On 2017年02月17日 16:45, Yubin Ruan wrote: > > > > > >On 2017年02月17日 02:58, Paul E. McKenney wrote: > >>On Tue, Feb 14, 2017 at 06:35:05PM +0800, Yubin Ruan wrote: > >>>On 2017/2/14 3:06, Paul E. McKenney wrote: > >>>>On Mon, Feb 13, 2017 at 09:55:50PM +0800, Yubin Ruan wrote: > >>>>>It have been mentioned in the book that there are three kinds of > >>>>>memory barriers: smp_rmb, smp_wmb, smp_mb > >>>>> > >>>>>I am confused about their actual semantic: > >>>>> > >>>>>The book says that(B.5 paragraph 2, perfbook2017.01.02a): > >>>>> > >>>>>for smp_rmb(): > >>>>> "The effect of this is that a read memory barrier orders > >>>>> only loads on the CPU that executes it, so that all loads > >>>>> preceding the read memory barrier will appear to have > >>>>> completed before any load following the read memory > >>>>> barrier" > >>>>> > >>>>>for smp_wmb(): > >>>>> "so that all stores preceding the write memory barrier will > >>>>> appear to have completed before any store following the > >>>>> write memory barrier" > >>>>> > >>>>>I wonder, is there any primitive "X" which can guarantees: > >>>>> "that all 'loads' preceding the X will appear to have completed > >>>>> before any *store* following the X " > >>>>> > >>>>>and similarly: > >>>>> "that all 'store' preceding the X will appear to have completed > >>>>> before any *load* following the X " > >>> > >>>I am reading your the material you provided. > >>>So, there is no short answer(yes/no) to the questions above?(I mean > >>>the primitive X) > >> > >>For smp_mb(), the full memory barrier, things are pretty simple. > >>All CPUs will agree that all accesses by any CPU preceding a given > >>smp_mb() happened before any accesses by that same CPU following that > >>same smp_mb(). Full memory barriers are also transitive, so that you > >>can reason (relatively) easily about situations involving many CPUs. > > One more thing about the full memory barrier. You say *all CPU > agree*. It does not include Alpha, right? It does include Alpha. Remember that Alpha's peculiarities occur when you -don't- have full memory barriers. If you have a full memory barrier between each pair of accesses, then everything will be ordered on pretty much every type of CPU. The one exception that I am aware of is Itanium, which also requires that the stores be converted to store-release instructions. Thanx, Paul > regards, > Yubin Ruan > > >>For smp_rmb() and smp_wmb(), not so much. The canonical example showing > >>the complexity of smp_wmb() is called "R": > >> > >> Thread 0 Thread 1 > >> -------- -------- > >> WRITE_ONCE(x, 1); WRITE_ONCE(y, 2); > >> smp_wmb(); smp_mb(); > >> WRITE_ONCE(y, 1); r1 = READ_ONCE(x); > >> > >>One might hope that if the final value of y is 2, then the value of > >>r1 must be 1. People hoping this would be disappointed, because > >>there really is hardware that will allow the outcome y == 1 && r1 == 0. > >> > >>See the following URL for many more examples of this sort of thing: > >> > >> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test6.pdf > >> > >>For more information, including some explanation of the nomenclature, > >>see: > >> > >> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf > >> > >>There are formal memory models that account for this, and in fact this > >>appendix is slated to be rewritten based on some work a group of us have > >>been doing over the past two years or so. A tarball containing a draft > >>of this work is attached. I suggested starting with index.html. If > >>you get a chance to look it over, I would value any suggestions that > >>you might have. > >> > > > >Thanks for your reply. I will take some time to read those materials. > >Discussions with you really help eliminate some of my doubts. Hopefully > >we can have more discussions in the future. > > > >regards, > >Yubin Ruan > > > >>>>>I know I can use the general smp_mb() for that, but that is a little > >>>>>too general. > >>>>> > >>>>>Do I miss/mix anything ? > >>>> > >>>>Well, the memory-ordering material is a bit dated. There is some work > >>>>underway to come up with a better model, and I presented on it a couple > >>>>weeks ago: > >>>> > >>>>http://www.rdrop.com/users/paulmck/scalability/paper/LinuxMM.2017.01.19a.LCA.pdf > >>>> > >>>> > >>>>This presentation calls out a tarball that includes some .html files > >>>>that have much better explanations, and this wording will hopefully > >>>>be reflected in an upcoming version of the book. Here is a direct > >>>>URL for the tarball: > >>>> > >>>>http://www.rdrop.com/users/paulmck/scalability/paper/LCA-LinuxMemoryModel.2017.01.15a.tgz > >>>> > >>>> > >>>> Thanx, Paul > >>>> > >>> > >>>regrads, > >>>Yubin Ruan > >>> > -- To unsubscribe from this list: send the line "unsubscribe perfbook" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html