Re: ask for help to implement a high performance sliding window

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Mon, 11 Dec 2017 10:10:33 -0800

Hello, LoopJump,

Glad you like the book!  Adding perfbook@xxxxxxxxxxxxxxx on CC in case
someone there has some good ideas.

First, I do feel the need to ask if you have actually benchmarked the
system and seen maximum computation to be a problem, perhaps on a profile.
After all, there is little payoff in optimizing portions of the system
that are not bottlenecks!

Second, let me make sure that I understand the problem:

1.	Your system receives sequence numbers on incoming packets.

2.	Each packets is processed, after which the corresponding sequence
	number should be included in a global "max" computation.

3.	If the whole thing isn't protected by a global lock, we know
	that the thread reading out the maximum value will get a snapshot
	that immediately becomes stale.  The reason for this instant
	staleness is that some other thread might immediately complete
	processing for another packet, and that packet's sequence number
	might update the maximum.

4.	I am assuming that the sequence number is large enough that
	overflow is not a problem.  If overflow is possible, things
	become quite interesting when a given packet's processing takes
	a very long time.

5.	I am assuming that packets are processed very quickly.	If this
	assumption is incorrect and packets take (say) ten milliseconds
	to process, then you should be able to use pretty much any method
	of updating the maximum.  But if packet processing was that slow,
	you probably wouldn't be asking me this question.

Third, some thoughts on implementing this efficiently.

By #3 above, we know that the thread reading out the maximum can have
a stale value.  Just how much staleness can be tolerated?

If (say) a millisecond of staleness is OK, one approach is to have each
thread maintain the maximum sequence number that it has encountered
thus far.  Maintaining this per-thread value should be extremely
efficient.  Then a separate thread (or function or whatever) can run
every millisecond, taking the maximum of the per-thread maxima.

Alternatively, each thread can periodically update the global maximum
based on that thread's local state.

Either way, the point is to update the global maximum less frequently
and thus have less problem with cache misses associated with such updates.

							Thanx, Paul

On Fri, Dec 08, 2017 at 09:21:16AM +0800, LoopJump wrote:
> Hi Paul,
> 
> I am a database kernel developer, I really love your book about parallel programming.
> 
> Now I encounter a question about a high performance sliding window in a multi threads program.
> 
> In my question, incoming request has a consecutive sequence(integer), these requests will be pushed into a thread pool(16 threads), what I want is to track the max_consecutive_sequence that has been processed.
> 
> I tried to use a bitmap (a bit for one sequence or replace a bit with 64bytes item to avoid false sharing) to mark each sequence. Everytime a request is processed, it check if it can advance the max_consecutive_sequence and determine how many sequence it can advance. 
> 
> I do not know if there is a better way to implement such a sliding window.
> 
> Thanks a lot for your help. 
> 
> 
> 
> LoopJump

--
To unsubscribe from this list: send the line "unsubscribe perfbook" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html