Re: Benchmark Data requested

"Heikki Linnakangas" <heikki@xxxxxxxxxxxxxxxx> · Tue, 05 Feb 2008 21:45:52 +0000

Jignesh K. Shah wrote:
Is there a way such an operation can be spawned as a worker process? 
Generally during such loading - which most people will do during 
"offpeak" hours I expect additional CPU resources available. By 
delegating such additional work to worker processes, we should be able 
to capitalize on additional cores in the system.

Hmm. You do need access to shared memory, locks, catalogs, and to run 
functions etc, so I don't think it's significantly easier than using 
multiple cores for COPY itself.

Even if it is a single core, the mere fact that the loading process will 
eventually wait for a read from the input file which cannot be 
non-blocking, the OS can timeslice it well for the second process to use 
those wait times for the index population work.

That's an interesting point.

What do you think?

Regards,
Jignesh

Heikki Linnakangas wrote:
Dimitri Fontaine wrote:
Le mardi 05 février 2008, Simon Riggs a écrit :
I'll look at COPY FROM internals to make this faster. I'm looking at
this now to refresh my memory; I already had some plans on the shelf.

Maybe stealing some ideas from pg_bulkload could somewhat help here?

http://pgfoundry.org/docman/view.php/1000261/456/20060709_pg_bulkload.pdf 

IIRC it's mainly about how to optimize index updating while loading 
data, and I've heard complaints on the line "this external tool has 
to know too much about PostgreSQL internals to be trustworthy as 
non-core code"... so...

I've been thinking of looking into that as well. The basic trick 
pg_bulkload is using is to populate the index as the data is being 
loaded. There's no fundamental reason why we couldn't do that 
internally in COPY. Triggers or constraints that access the table 
being loaded would make it impossible, but we should be able to detect 
that and fall back to what we have now.

What I'm basically thinking about is to modify the indexam API of 
building a new index, so that COPY would feed the tuples to the 
indexam, instead of the indexam opening and scanning the heap. The 
b-tree indexam would spool the tuples into a tuplesort as the COPY 
progresses, and build the index from that at the end as usual.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend