Re: Performance on Bulk Insert to Partitioned Table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Charles,

I am not working on Postgres anymore and none of our patches were ever accepted by the community. The list of development I made can still be found at http://wiki.postgresql.org/wiki/Aster%27s_Development_Projects

All the code related to these improvements must still be accessible in the archive. If you can't find something, let me know, I'll try to find it in my backups!

Happy holidays
Emmanuel


On 12/24/2012 13:36, Charles Gomes wrote:
I've just found this:
From:
http://archives.postgresql.org/pgsql-hackers/2008-12/msg01221.php

"initial tests to insert 140k rows are as follows:

- direct inserts in a child table: 2 seconds

- pgplsql trigger (IF ... ELSE IF ... blocks) : 14.5 seconds.

- C trigger: 4 seconds (actually the overhead is in the constraint check)

"

This is from 2008 and looks like at that time those folks where already having performance issues with partitions.

Going to copy some folks from the old thread, hopefully 4 years later they may have found a solution. Maybe they've moved on into something more exciting, maybe He is in another world where we don't have database servers. In special the brave Emmanuel for posting his trigger code that I will hack into my own :P
Thanks Emmanuel.






----------------------------------------
Subject: Re:  Performance on Bulk Insert to Partitioned Table
From: itparanoia@xxxxxxxxx
Date: Mon, 24 Dec 2012 21:11:07 +0400
CC: jeff.janes@xxxxxxxxx; ondrej.ivanic@xxxxxxxxx; pgsql-performance@xxxxxxxxxxxxxx
To: charlesrg@xxxxxxxxxxx


On Dec 24, 2012, at 9:07 PM, Charles Gomes <charlesrg@xxxxxxxxxxx> wrote:

By the way, I've just re-wrote the code to target the partitions individually and I've got almost 4 times improvement.
Shouldn't it be faster to process the trigger, I would understand if there was no CPU left, but there is lots of cpu to chew.
I saw your 20% idle cpu and raise eyebrows.

It seems that there will be no other way to speedup unless the insert code is partition aware.

----------------------------------------
From: charlesrg@xxxxxxxxxxx
To: jeff.janes@xxxxxxxxx
CC: ondrej.ivanic@xxxxxxxxx; pgsql-performance@xxxxxxxxxxxxxx
Subject: Re:  Performance on Bulk Insert to Partitioned Table
Date: Mon, 24 Dec 2012 10:51:12 -0500

________________________________
Date: Sun, 23 Dec 2012 14:55:16 -0800
Subject: Re:  Performance on Bulk Insert to Partitioned Table
From: jeff.janes@xxxxxxxxx
To: charlesrg@xxxxxxxxxxx
CC: ondrej.ivanic@xxxxxxxxx; pgsql-performance@xxxxxxxxxxxxxx

On Thursday, December 20, 2012, Charles Gomes wrote:
True, that's the same I feel, I will be looking to translate the
trigger to C if I can find good examples, that should accelerate.

I think your performance bottleneck is almost certainly the dynamic
SQL. Using C to generate that dynamic SQL isn't going to help much,
because it is still the SQL engine that has to parse, plan, and execute
it.

Are the vast majority if your inserts done on any given day for records
from that same day or the one before; or are they evenly spread over
the preceding year? If the former, you could use static SQL in IF and
ELSIF for those days, and fall back on the dynamic SQL for the
exceptions in the ELSE block. Of course that means you have to update
the trigger every day.


Using rules would be totally bad as I'm partitioning daily and after
one year having 365 lines of IF won't be fun to maintain.

Maintaining 365 lines of IF is what Perl was invented for. That goes
for triggers w/ static SQL as well as for rules.

If you do the static SQL in a trigger and the dates of the records are
evenly scattered over the preceding year, make sure your IFs are nested
like a binary search, not a linear search. And if they are mostly for
"today's" date, then make sure you search backwards.

Cheers,

Jeff
Jeff, I've changed the code from dynamic to:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $$
DECLARE
r_date text;
BEGIN
r_date = to_char(new.received_time, 'YYYY_MM_DD');
case r_date
when '2012_09_10' then
insert into quotes_2012_09_10 values (NEW.*) using new;
return;
when '2012_09_11' then
insert into quotes_2012_09_11 values (NEW.*) using new;
return;
when '2012_09_12' then
insert into quotes_2012_09_12 values (NEW.*) using new;
return;
when '2012_09_13' then
insert into quotes_2012_09_13 values (NEW.*) using new;
return;
when '2012_09_14' then
insert into quotes_2012_09_14 values (NEW.*) using new;
return;
when '2012_09_15' then
insert into quotes_2012_09_15 values (NEW.*) using new;
return;
when '2012_09_16' then
insert into quotes_2012_09_16 values (NEW.*) using new;
return;
when '2012_09_17' then
insert into quotes_2012_09_17 values (NEW.*) using new;
return;
when '2012_09_18' then
insert into quotes_2012_09_18 values (NEW.*) using new;
return;
when '2012_09_19' then
insert into quotes_2012_09_19 values (NEW.*) using new;
return;
when '2012_09_20' then
insert into quotes_2012_09_20 values (NEW.*) using new;
return;
when '2012_09_21' then
insert into quotes_2012_09_21 values (NEW.*) using new;
return;
when '2012_09_22' then
insert into quotes_2012_09_22 values (NEW.*) using new;
return;
when '2012_09_23' then
insert into quotes_2012_09_23 values (NEW.*) using new;
return;
when '2012_09_24' then
insert into quotes_2012_09_24 values (NEW.*) using new;
return;
end case
RETURN NULL;
END;
$$
LANGUAGE plpgsql;


However I've got no speed improvement.
I need to keep two weeks worth of partitions at a time, that's why all the WHEN statements.
Wish postgres could automate the partition process natively like the other sql db.

Thank you guys for your help.

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance 		 	   		



--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux