Re: Benchmark Data requested

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le mardi 05 février 2008, Simon Riggs a écrit :
> I'll look at COPY FROM internals to make this faster. I'm looking at
> this now to refresh my memory; I already had some plans on the shelf.

Maybe stealing some ideas from pg_bulkload could somewhat help here?
  http://pgfoundry.org/docman/view.php/1000261/456/20060709_pg_bulkload.pdf

IIRC it's mainly about how to optimize index updating while loading data, and 
I've heard complaints on the line "this external tool has to know too much 
about PostgreSQL internals to be trustworthy as non-core code"... so...

> > The basic idea is for pgloader to ask PostgreSQL about
> > constraint_exclusion, pg_inherits and pg_constraint and if pgloader
> > recognize both the CHECK expression and the datatypes involved, and if we
> > can implement the CHECK in python without having to resort to querying
> > PostgreSQL, then we can run a thread per partition, with as many COPY
> > FROM running in parallel as there are partition involved (when threads =
> > -1).
> >
> > I'm not sure this will be quicker than relying on PostgreSQL trigger or
> > rules as used for partitioning currently, but ISTM Jignesh quoted § is
> > just about that.
>
> Much better than triggers and rules, but it will be hard to get it to
> work.

Well, I'm thinking about providing a somewhat modular approach where pgloader 
code is able to recognize CHECK constraints, load a module registered to the 
operator and data types, then use it.
The modules and their registration should be done at the configuration level, 
I'll provide some defaults and users will be able to add their code, the same 
way on-the-fly reformat modules are handled now.

This means that I'll be able to provide (hopefully) quickly the basic cases 
(CHECK on dates >= x and < y), numeric ranges, etc, and users will be able to 
care about more complex setups.

When the constraint won't match any configured pgloader exclusion module, the 
trigger/rule code will get used (COPY will go to the main table), and when 
the python CHECK implementation will be wrong (worst case) PostgreSQL will 
reject the data and pgloader will fill your reject data and log files. And 
you're back to debugging your python CHECK implementation...

All of this is only a braindump as of now, and maybe quite an optimistic 
one... but baring any 'I know this can't work' objection that's what I'm 
gonna try to implement for next pgloader version.

Thanks for comments, input is really appreciated !
-- 
dim

Attachment: signature.asc
Description: This is a digitally signed message part.


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux