On Thu, Feb 23, 2017 at 11:33:55PM +0900, Akira Yokosawa wrote: > On 2017/02/23 22:44:41 +0900, Akira Yokosawa wrote: > > On 2017/02/22 21:22:48 +0800, Yubin Ruan wrote: > >> Hi, > >> I am reading chapter 5 and got confused by the answer to Quick Quiz > >> 5.17. So this is an Array-Based Per-thread Eventually Consistent Counter > >> scheme: > >> > >> 1 DEFINE_PER_THREAD(unsigned long, counter); > >> 2 unsigned long global_count; > >> 3 int stopflag; > >> 4 > >> 5 void inc_count(void) > >> 6 { > >> 7 ACCESS_ONCE(__get_thread_var(counter))++; > >> 8 } > >> 9 > >> 10 unsigned long read_count(void) > >> 11 { > >> 12 return ACCESS_ONCE(global_count); > >> 13 } > >> 14 > >> 15 void *eventual(void *arg) > >> 16 { > >> 17 int t; > >> 18 int sum; > >> 19 while (stopflag < 3) { > >> 20 sum = 0; > >> 21 for_each_thread(t) > >> 22 sum += ACCESS_ONCE(per_thread(counter, t)); > >> 23 ACCESS_ONCE(global_count) = sum; > >> 24 poll(NULL, 0, 1); > >> 25 if (stopflag) { > >> 26 smp_mb(); > >> 27 stopflag++; > >> 28 } > >> 29 } > >> 30 return NULL; > >> 31 } > >> 32 > >> 33 void count_init(void) > >> 34 { > >> 35 thread_id_t tid; > >> 36 if (pthread_create(&tid, NULL, eventual, NULL)) { > >> 37 perror("count_init:pthread_create"); > >> 38 exit(-1); > >> 39 } > >> 40 } > >> 41 > >> 42 void count_cleanup(void) > >> 43 { > >> 44 stopflag = 1; > >> 45 while (stopflag < 3) > >> 46 poll(NULL, 0, 1); > >> 47 smp_mb(); > >> 48 } > >> > >> > >> I understand the code. In Quick Quiz 5.17, the question is: > >> > >> Why _doesn't_ the `inc_count()' in the code above need to use atomic > >> instructions? After all, we now have multiple threads accessing the > >> per-thread counters! > >> > >> I think I know the answer to this question: now that you use per-thread > >> variable, you don't need atomic instructions. The scenarios where you > >> need atomic instructions are some places like this: > >> > >> 1 long counter = 0; > >> 2 void inc_count(void) > >> 3 { > >> 4 counter++; //need atomic instruction > >> 5 } > >> 6 > >> 7 long read_count(void) > >> 8 { > >> 9 return counter; > >> 10} > >> > >> > >> But, the answer provided in the book is: > >> > >> <------------------- Answer Begin ----------------------> > >> Because one of the two threads only reads, and because > >> the variable is aligned and machine-sized, non-atomic instructions > >> suffice. > > > > Well, I think this sentence explains everything necessary. > > Since the per-thread counters are also read-accessed from eventual() thread, > > the the increment of the count should look "atomic" from eventual(). > > If the variable is *not* machine-sized or unaligned, the increment > > access might need multiple read / write accesses and it would be not > > "atomic" from other thread's point of view. > > This was somewhat confusing. Let me try again. > > Since the per-thread counters are also read-accessed from eventual() thread, > the write access of the increment of the count should look "atomic" from eventual(). > If the variable is *not* machine-sized nor aligned, the write of the modified value > might need multiple read / write accesses and they would not be 'atomic" from > other threads' points of view. > > Incrementing a counter is a Read-Modify-Write access. In per-thread counter, > it is not necessary to make the whole RMW access atomic. Only the write of the > incremented value needs to look "atomic". > > By the way, an "unsigned long" variable is machine-sized and aligned on > most compiler-platform combination, but there might be exceptions I suppose. Good explanation! > Thanks, Akira > > > > As as side note, you can see the changes made in this Quick Quiz and Answer > > in the git history. Actually, the original version (dated 2011-01-01, > > git tag:v2011.01.02a) of the Quiz said: > > > > Why does inc_count() in > > Figure 5.? (whatever figure number at that time) > > need to use atomic instructions? > > > > This was the opposite of the current version. > > > >> That said, the ACCESS_ONCE() macro is used to prevent > >> compiler optimizations that might otherwise prevent the counter > >> updates from becoming visible to eventual() [Cor12]. > >> > >> An older version of this algorithm did in fact use atomic > >> instructions, kudos to Ersoy Bayramoglu for pointing > >> out that they are in fact unnecessary. That said, atomic > >> ~~~~~~~~~~~~~~~~~ > >> instructions would be needed in cases where the per-thread > >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >> counter variables were smaller than the global global_count. > >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > I suppose Paul added this sentence to explain the case where > > per-thread counter is shorter than machine-size. > > > > I'm sure Paul will give some comment once he recalls the circumstances ;-) > > Now added explicit CC: Paul. Sorry for the slow reponse, I am on travel, and am likely to be completely off the grid for the next 24-48 hours. (Hiking!) But yes, checking the git history can be quite interesting, both in terms of evolution of the algorithm and spotting my past mistakes. ;-) Thanx, Paul > > Thanks, Akira > > > >> However, note that on a 32-bit system, the per-thread counter > >> variables might need to be limited to 32 bits in order to sum > >> them accurately, but with a 64-bit global_count variable to avoid > >> overflow. In this case, it is necessary to zero the per-thread > >> counter variables periodically in order to avoid overflow. It is > >> extremely important to note that this zeroing cannot be delayed too long > >> or overflow of the smaller per-thread variables will result. This > >> approach therefore imposes real-time requirements on the underlying > >> system, and in turn must be used with extreme care. > >> > >> In contrast, if all variables are the same size, overflow > >> of any variable is harmless because the eventual sum will > >> be modulo the word size. > >> <------------------------ End --------------------------> > >> > >> Although more complicated than I think, I totally fine with this answer, > >> except the sentence with ~~~ under it. Why is that? Why do we need > >> atomic instructions when counter variables were smaller than the global > >> `global_count' ? Also, the second sentence of the question seems to hint > >> about cache coherency, but I cannot see the point :( > >> > >> regards, > >> Yubin Ruan > >> -- > >> To unsubscribe from this list: send the line "unsubscribe perfbook" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > -- To unsubscribe from this list: send the line "unsubscribe perfbook" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html