On Tue, Feb 25, 2020 at 11:30:57AM +0800, Junchang Wang wrote: > On Sun, Feb 23, 2020 at 06:45:49PM -0800, Paul E. McKenney wrote: > >On Mon, Feb 24, 2020 at 08:42:09AM +0900, Akira Yokosawa wrote: > >> On 2020/02/24 0:33, Paul E. McKenney wrote: > >> > Hello! > >> > > >> > I finally found and fixed the rcu_barrier() bug [1], so I should again > >> > be able to devote some big-system test time to redoing performance > >> > results in perfbook. Once that is done, I expect that it is time for > >> > the second edition. > >> > > >> > I might also convert the blank page hiding the solution to the Dining > >> > Philosophers Problem to a quick quiz, but I consider this optional. > >> > > >> > Are there any other changes that are needed? [2] > >> > >> In response to Junchang's (off the list) proposal, I noticed that > >> Figure 10.27 needs update to reflect the change in the code done > >> in early 2019. > >> > >> Can you update it? > >> > >> The change simplified the lookup side, but doubled the cost of > >> updates during resizing. So it is likely the discussion in the text > >> also needs update. > > > >I will update this when rerunning on the large system in any case, > >but thank you for calling my attention to this. Yes, it does need > >to be fixed. (My problem in early 2019 was not having access to > >a large system, now solved!) > > > > Hi Paul and Akira, > > I think you are discussing Figure 10.17 (Overhead of Resizing Hash > Tables Between 1024 ...). Is that right? > > Here is a comment on this performance evaluation. Test framework > hashtorture.h indirectly controls the number of elements in a hash table > (and the average load factor) by inserting nupdaters*elperupdater/2 > elements before running a test, such that U = nupdaters*elperupdater, > where U is the size of the pool of all of the test elements. This > approach worked well for old machines. One thing worth noting is that U > could be too small for currently available servers which typically have > larger Last Level Caches of dozens of megabytes. For example, the LLC of > the E5 CPU I used to evaluate rehash can perfectly hold 131,072*2 > elements, and hence lookup operations barely reach the memory (The Linux > perf reports close to zero cache misses.) Update/resize operations > update some elements and invalidate some cachelines, but it seems the > hardware prefetcher works well in this case, given that the test set U > is relatively small. > > If we want to test the hash algorithm in an environment where lookup > operations may need to fetch data from memory, U should be set to a much > bigger number (e.g., 10M). But a bigger U means a bigger number of > pre-inserted elements if the above mentioned approach is used. So in my > experiments, I extended hashtorture.h and broke the relationship between > these two numbers. Please let me know if there are any other solutions. > > But like you said, the code and the evaluation figure should be > simplified for a textbook. So I think it's reasonable that you still use > the existing approach, but the comment may worth being mentioned in the > perfbook to better explain your experimental result. Here is hoping that you are not increasing the hash-table per-bucket load factor as you increase the number of CPUs, as one unfortunate recent paper did. Who is reviewing these things, anyway??? ;-) Thanx, Paul > Thanks, > --Junchang > > > > Thanx, Paul > > > >> Thanks, Akira > >> > >> > > >> > Thanx, Paul > >> > > >> > > >> > [1] The fix is at 77abca1c358a ("rcu: Make rcu_barrier() account for > >> > offline no-CBs CPUs") in -rcu, in case anyone is curious. > >> > > >> > [2] Here is a list of some things that I believe can follow the second > >> > edition: > >> > > >> > Expand lock-free algorithm discussion to include LIFO push, > >> > illustrating the infamous pointer-zap issue. (See ISO SC22 > >> > WG21 P1726R3, which should appear in a couple of weeks, for > >> > more details.) > >> > > >> > Add text describing the Issaquah Challenge. > >> > > >> > Add text describing skiplists, one of the more concurrency > >> > friendly data structures. > >> > > >> > Add text describing data-race detectors such as KCSAN. (This needs > >> > to wait for more Linux-kernel experience.) > >> > > >> > Additional material from todo.txt. ;-) > >> >