Re: Benchmark Data requested

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2008-02-05 at 15:06 +0100, Dimitri Fontaine wrote:
> Hi,
> 
> Le lundi 04 février 2008, Jignesh K. Shah a écrit :
> > Single stream loader of PostgreSQL takes hours to load data. (Single
> > stream load... wasting all the extra cores out there)
> 
> I wanted to work on this at the pgloader level, so CVS version of pgloader is 
> now able to load data in parallel, with a python thread per configured 
> section (1 section = 1 data file = 1 table is often the case).
> Not configurable at the moment, but I plan on providing a "threads" knob which 
> will default to 1, and could be -1 for "as many thread as sections".

That sounds great. I was just thinking of asking for that :-)

I'll look at COPY FROM internals to make this faster. I'm looking at
this now to refresh my memory; I already had some plans on the shelf.

> > Multiple table loads ( 1 per table) spawned via script  is bit better
> > but hits wal problems.
> 
> pgloader will too hit the WAL problem, but it still may have its benefits, or 
> at least we will soon (you can already if you take it from CVS) be able to 
> measure if the parallel loading at the client side is a good idea perf. wise.

Should be able to reduce lock contention, but not overall WAL volume.

> [...]
> > I have not even started Partitioning of tables yet since with the
> > current framework, you have to load the tables separately into each
> > tables which means for the TPC-H data you need "extra-logic" to take
> > that table data and split it into each partition child table. Not stuff
> > that many people want to do by hand.
> 
> I'm planning to add ddl-partitioning support to pgloader:
>   http://archives.postgresql.org/pgsql-hackers/2007-12/msg00460.php
> 
> The basic idea is for pgloader to ask PostgreSQL about constraint_exclusion, 
> pg_inherits and pg_constraint and if pgloader recognize both the CHECK 
> expression and the datatypes involved, and if we can implement the CHECK in 
> python without having to resort to querying PostgreSQL, then we can run a 
> thread per partition, with as many COPY FROM running in parallel as there are 
> partition involved (when threads = -1).
> 
> I'm not sure this will be quicker than relying on PostgreSQL trigger or rules 
> as used for partitioning currently, but ISTM Jignesh quoted § is just about 
> that.

Much better than triggers and rules, but it will be hard to get it to
work.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux