Search Postgresql Archives

Re: Improving performance of merging data between tables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/28/2014 3:49 PM, Pawel Veselov wrote:
Hi.

I was wondering if anybody would have any ideas on how to improve
certain operations that we are having.

<SNIP>

Besides "can somebody please look at this and let me know if I'm doing
something utterly stupid", here are my questions.

1) How do I find out what exactly is consuming the CPU in a PL/pgSQL
function? All I see is that the calls to merge_all() function take long
time, and the CPU is high while this is going on.



First, I'll admit I didn't read your entire post.

I can think of a couple methods:

1) try each of the statements in merge_all by hand with an "explain analyze" in front to see which is slow. Look for things that hit big tables without an index. Check that fk lookups are indexes.

2) try pg_stat_statements, setting "pg_stat_statements.track = all".  see:
http://www.postgresql.org/docs/9.4/static/pgstatstatements.html

I have used this to profile some functions, and it worked pretty well. Mostly I use it on a test box, but once ran it on the live, which was scary, but worked great.

3) try auto-explain:
http://www.postgresql.org/docs/9.4/static/auto-explain.html

I've never used it, so don't know if it'll show each statement inside a function. Dumps stuff to the log AFAIK, so you'll have to dig out the info by hand.



> 2) Is there a better way to merge individual rows, except doing
> UPDATE/INSERT in a loop, and would that be CPU expensive?
>

Not that I know of. I use pretty much the same thing. Soon! we will have merge/upsert support. Hopefully it'll be fast.

> 3) Is there a better way to merge whole tables? However, note that I
> need to translate primary keys from node main table into the common main
> table, as they are used as foreign keys, hence the loops. I suspect the
> looping is CPU intensive.

Avoiding loops and doing things as sets is the best way. If possible. The only loop I saw was looping over the merge_xxx tables, which is probably the only way.

If possible (if you haven't already) you could add and extra column to your secondary table that you can set as the main table's key.

bulk insert into second;
update second set magic = (select key from main where ... );

Then, maybe, you can do two ops in batch:

update main (where key exists in main)
insert into main (where key not exists in main)

-Andy







--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux