Re: Performance issue with cross table updates

Jeff Janes <jeff.janes@xxxxxxxxx> · Tue, 11 Sep 2012 21:01:20 -0700

On Mon, Sep 10, 2012 at 9:34 AM, Craig Gibson <craiggib@xxxxxxxxx> wrote:
> Hi all
>
> I am no database wizard so I am hoping someone may be able to assist me :)
>
> I get a daily CSV file of 6.5 million records. I create a temporary
> table and COPY them in. On completion I create an index on the mdnid
> column. This column is also indexed in table 2. This part is very
> fast. I had some 'checkpoint too often' issues, but that I have
> resolved.
>
> I then use the following procedure to update all the records, and if a
> record does not exist, insert it instead:
>
...
>
> From my understanding, a for loop is encapsulated in a cursor anyway
> so no need to do that. Am I fundamentally doing something wrong as the
> operation is slow as molasses?

How big is the parent table?  Are you CPU limited or IO limited?

If you are not CPU limited, then I would guess that the indexes on the
parent table do not fit in RAM or shared_buffers and that maintaining
the indexes on the parent table during the updates/inserts is the
bottleneck.

> Maybe there is a better way altogether
> that I have not thought of? The bottom line is that at no point can
> the e_lookup table be unavailable to clients,

It always has to be available for updates, or just for selects?

> else I would have just
> done a simple drop and rename post the COPY.

Maybe you can just do an atomic rename.

Cheers,

Jeff

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general