Re: Benchmark Data requested --- pgloader CE design ideas

Greg Smith <gsmith@xxxxxxxxxxxxx> · Thu, 7 Feb 2008 12:06:42 -0500 (EST)

On Thu, 7 Feb 2008, Dimitri Fontaine wrote:

I was thinking of not even reading the file content from the controller
thread, just decide splitting points in bytes (0..ST_SIZE/4 -
ST_SIZE/4+1..2*ST_SIZE/4 etc) and let the reading thread fine-tune by
beginning to process input after having read first newline, etc.

The problem I was pointing out is that if chunk#2 moved foward a few bytes 
before it started reading in search of a newline, how will chunk#1 know 
that it's supposed to read up to that further point?  You have to stop #1 
from reading further when it catches up with where #2 started.  Since the 
start of #2 is fuzzy until some reading is done, what you're describing 
will need #2 to send some feedback to #1 after they've both started, and 
that sounds bad to me.  I like designs where the boundaries between 
threads are clearly defined before any of them start and none of them ever 
talk to the others.

In both cases, maybe it would also be needed for pgloader to be able to have a
separate thread for COPYing the buffer to the server, allowing it to continue
preparing next buffer in the meantime?

That sounds like a V2.0 design to me.  I'd only chase after that level of 
complexity if profiling suggests that's where the bottleneck really is.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend