On 09/28, Paul E. McKenney wrote: > > On Fri, Sep 28, 2007 at 06:47:14PM +0400, Oleg Nesterov wrote: > > Ah, I was confused by the comment, > > > > smp_mb(); /* Don't call for memory barriers before we see zero. */ > > ^^^^^^^^^^^^^^^^^^ > > So, in fact, we need this barrier to make sure that _other_ CPUs see these > > changes in order, thanks. Of course, _we_ already saw zero. > > Fair point! > > Perhaps: "Ensure that all CPUs see their rcu_mb_flag -after- the > rcu_flipctrs sum to zero" or some such? > > > But in that particular case this doesn't matter, rcu_try_flip_waitzero() > > is the only function which reads the "non-local" per_cpu(rcu_flipctr), so > > it doesn't really need the barrier? (besides, it is always called under > > fliplock). > > The final rcu_read_unlock() that zeroed the sum was -not- under fliplock, > so we cannot necessarily rely on locking to trivialize all of this. Yes, but still I think this mb() is not necessary. Becasue we don't need the "if we saw rcu_mb_flag we must see sum(lastidx)==0" property. When another CPU calls rcu_try_flip_waitzero(), it will use another lastidx. OK, minor issue, please forget. > > OK, the last (I promise :) off-topic question. When CPU 0 and 1 share a > > store buffer, the situation is simple, we can replace "CPU 0 stores" with > > "CPU 1 stores". But what if CPU 0 is equally "far" from CPUs 1 and 2? > > > > Suppose that CPU 1 does > > > > wmb(); > > B = 0 > > > > Can we assume that CPU 2 doing > > > > if (B == 0) { > > rmb(); > > > > must see all invalidations from CPU 0 which were seen by CPU 1 before wmb() ? > > Yes. CPU 2 saw something following CPU 1's wmb(), so any of CPU 2's > reads following its rmb() must therefore see all of CPU 1's stores > preceding the wmb(). Ah, but I asked the different question. We must see CPU 1's stores by definition, but what about CPU 0's stores (which could be seen by CPU 1)? Let's take a "real life" example, A = B = X = 0; P = Q = &A; CPU_0 CPU_1 CPU_2 P = &B; *P = 1; if (X) { wmb(); rmb(); X = 1; BUG_ON(*P != 1 && *Q != 1); } So, is it possible that CPU_1 sees P == &B, but CPU_2 sees P == &A ? > The other approach would be to simply have a separate thread for this > purpose. Batching would amortize the overhead (a single trip around the > CPUs could satisfy an arbitrarily large number of synchronize_sched() > requests). Yes, this way we don't need to uglify migration_thread(). OTOH, we need another kthread ;) Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html