On Sun, Feb 23, 2020 at 06:45:49PM -0800, Paul E. McKenney wrote: >On Mon, Feb 24, 2020 at 08:42:09AM +0900, Akira Yokosawa wrote: >> On 2020/02/24 0:33, Paul E. McKenney wrote: >> > Hello! >> > >> > I finally found and fixed the rcu_barrier() bug [1], so I should again >> > be able to devote some big-system test time to redoing performance >> > results in perfbook. Once that is done, I expect that it is time for >> > the second edition. >> > >> > I might also convert the blank page hiding the solution to the Dining >> > Philosophers Problem to a quick quiz, but I consider this optional. >> > >> > Are there any other changes that are needed? [2] >> >> In response to Junchang's (off the list) proposal, I noticed that >> Figure 10.27 needs update to reflect the change in the code done >> in early 2019. >> >> Can you update it? >> >> The change simplified the lookup side, but doubled the cost of >> updates during resizing. So it is likely the discussion in the text >> also needs update. > >I will update this when rerunning on the large system in any case, >but thank you for calling my attention to this. Yes, it does need >to be fixed. (My problem in early 2019 was not having access to >a large system, now solved!) > Hi Paul and Akira, I think you are discussing Figure 10.17 (Overhead of Resizing Hash Tables Between 1024 ...). Is that right? Here is a comment on this performance evaluation. Test framework hashtorture.h indirectly controls the number of elements in a hash table (and the average load factor) by inserting nupdaters*elperupdater/2 elements before running a test, such that U = nupdaters*elperupdater, where U is the size of the pool of all of the test elements. This approach worked well for old machines. One thing worth noting is that U could be too small for currently available servers which typically have larger Last Level Caches of dozens of megabytes. For example, the LLC of the E5 CPU I used to evaluate rehash can perfectly hold 131,072*2 elements, and hence lookup operations barely reach the memory (The Linux perf reports close to zero cache misses.) Update/resize operations update some elements and invalidate some cachelines, but it seems the hardware prefetcher works well in this case, given that the test set U is relatively small. If we want to test the hash algorithm in an environment where lookup operations may need to fetch data from memory, U should be set to a much bigger number (e.g., 10M). But a bigger U means a bigger number of pre-inserted elements if the above mentioned approach is used. So in my experiments, I extended hashtorture.h and broke the relationship between these two numbers. Please let me know if there are any other solutions. But like you said, the code and the evaluation figure should be simplified for a textbook. So I think it's reasonable that you still use the existing approach, but the comment may worth being mentioned in the perfbook to better explain your experimental result. Thanks, --Junchang > Thanx, Paul > >> Thanks, Akira >> >> > >> > Thanx, Paul >> > >> > >> > [1] The fix is at 77abca1c358a ("rcu: Make rcu_barrier() account for >> > offline no-CBs CPUs") in -rcu, in case anyone is curious. >> > >> > [2] Here is a list of some things that I believe can follow the second >> > edition: >> > >> > Expand lock-free algorithm discussion to include LIFO push, >> > illustrating the infamous pointer-zap issue. (See ISO SC22 >> > WG21 P1726R3, which should appear in a couple of weeks, for >> > more details.) >> > >> > Add text describing the Issaquah Challenge. >> > >> > Add text describing skiplists, one of the more concurrency >> > friendly data structures. >> > >> > Add text describing data-race detectors such as KCSAN. (This needs >> > to wait for more Linux-kernel experience.) >> > >> > Additional material from todo.txt. ;-) >> >