Re: Benchmark Data requested

Dimitri Fontaine <dfontaine@xxxxxxxxxxxx> · Wed, 6 Feb 2008 11:29:42 +0100

Le mercredi 06 février 2008, Greg Smith a écrit :
> pgloader is a great tool for a lot of things, particularly if there's any
> chance that some of your rows will get rejected.  But the way things pass
> through the Python/psycopg layer made it uncompetative (more than 50%
> slowdown) against the straight COPY path from a rows/second perspective
> the last time (V2.1.0?) 

I've yet to add in the psycopg wrapper Marko wrote for skytools: at the moment 
I'm using the psycopg1 interface even when psycopg2 is used, and it seems the 
new version has some great performance improvements. I just didn't bother 
until now thinking this wouldn't affect COPY.

> I did what I thought was a fair test of it (usual 
> caveat of "with the type of data I was loading").  Maybe there's been some
> gigantic improvement since then, but it's hard to beat COPY when you've
> got an API layer or two in the middle.

Did you compare to COPY or \copy? I'd expect psycopg COPY api not to be that 
more costly than psql one, after all.
Where pgloader is really left behind (in term of tuples inserted per second) 
compared to COPY is when it has to jiggle a lot with the data, I'd say 
(reformat, reorder, add constants, etc). But I've tried to design it so that 
when not configured to arrange (massage?) the data, the code path is the 
simplest possible.

Do you want to test pgloader again with Marko psycopgwrapper code to see if 
this helps? If yes I'll arrange to push it to CVS ASAP.

Maybe at the end of this PostgreSQL backend code will be smarter than pgloader 
(wrt error handling and data massaging) and we'll be able to drop the 
project, but in the meantime I'll try my best to have pgloader as fast as 
possible :)
-- 
dim
Attachment:
signature.asc

Description: This is a digitally signed message part.