Hello, LoopJump, Glad you like the book! Adding perfbook@xxxxxxxxxxxxxxx on CC in case someone there has some good ideas. First, I do feel the need to ask if you have actually benchmarked the system and seen maximum computation to be a problem, perhaps on a profile. After all, there is little payoff in optimizing portions of the system that are not bottlenecks! Second, let me make sure that I understand the problem: 1. Your system receives sequence numbers on incoming packets. 2. Each packets is processed, after which the corresponding sequence number should be included in a global "max" computation. 3. If the whole thing isn't protected by a global lock, we know that the thread reading out the maximum value will get a snapshot that immediately becomes stale. The reason for this instant staleness is that some other thread might immediately complete processing for another packet, and that packet's sequence number might update the maximum. 4. I am assuming that the sequence number is large enough that overflow is not a problem. If overflow is possible, things become quite interesting when a given packet's processing takes a very long time. 5. I am assuming that packets are processed very quickly. If this assumption is incorrect and packets take (say) ten milliseconds to process, then you should be able to use pretty much any method of updating the maximum. But if packet processing was that slow, you probably wouldn't be asking me this question. Third, some thoughts on implementing this efficiently. By #3 above, we know that the thread reading out the maximum can have a stale value. Just how much staleness can be tolerated? If (say) a millisecond of staleness is OK, one approach is to have each thread maintain the maximum sequence number that it has encountered thus far. Maintaining this per-thread value should be extremely efficient. Then a separate thread (or function or whatever) can run every millisecond, taking the maximum of the per-thread maxima. Alternatively, each thread can periodically update the global maximum based on that thread's local state. Either way, the point is to update the global maximum less frequently and thus have less problem with cache misses associated with such updates. Thanx, Paul On Fri, Dec 08, 2017 at 09:21:16AM +0800, LoopJump wrote: > Hi Paul, > > I am a database kernel developer, I really love your book about parallel programming. > > Now I encounter a question about a high performance sliding window in a multi threads program. > > In my question, incoming request has a consecutive sequence(integer), these requests will be pushed into a thread pool(16 threads), what I want is to track the max_consecutive_sequence that has been processed. > > I tried to use a bitmap (a bit for one sequence or replace a bit with 64bytes item to avoid false sharing) to mark each sequence. Everytime a request is processed, it check if it can advance the max_consecutive_sequence and determine how many sequence it can advance. > > I do not know if there is a better way to implement such a sliding window. > > Thanks a lot for your help. > > > > LoopJump -- To unsubscribe from this list: send the line "unsubscribe perfbook" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html