Re: Performance on Bulk Insert to Partitioned Table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Emmanuel, I really appreciate you getting back on this old old topic.
Wish you a very very happy Christmas and happy new year.

I'm
 kinda disappointed to see that since 2008 pgsql has not evolved to support native 
partitioning. Partitioning with Triggers is so slow.
Looks like pgsql lost some momentum after departure of contributors with initiative like you.

The code I've copied from your post and I'm modifying it for 9.2 and will post it back here.

Thank you very much,
Charles

----------------------------------------
> Date: Mon, 24 Dec 2012 13:47:12 -0500
> From: cecchet@xxxxxxxxx
> To: charlesrg@xxxxxxxxxxx
> CC: itparanoia@xxxxxxxxx; jeff.janes@xxxxxxxxx; ondrej.ivanic@xxxxxxxxx; pgsql-performance@xxxxxxxxxxxxxx; manu@xxxxxxxxxxxxxxx; robertmhaas@xxxxxxxxx; stark@xxxxxxxxxxxxxxxx
> Subject: Re:  Performance on Bulk Insert to Partitioned Table
>
> Hi Charles,
>
> I am not working on Postgres anymore and none of our patches were ever
> accepted by the community.
> The list of development I made can still be found at
> http://wiki.postgresql.org/wiki/Aster%27s_Development_Projects
>
> All the code related to these improvements must still be accessible in
> the archive. If you can't find something, let me know, I'll try to find
> it in my backups!
>
> Happy holidays
> Emmanuel
>
>
> On 12/24/2012 13:36, Charles Gomes wrote:
> > I've just found this:
> > From:
> > http://archives.postgresql.org/pgsql-hackers/2008-12/msg01221.php
> >
> > "initial tests to insert 140k rows are as follows:
> >
> > - direct inserts in a child table: 2 seconds
> >
> > - pgplsql trigger (IF ... ELSE IF ... blocks) : 14.5 seconds.
> >
> > - C trigger: 4 seconds (actually the overhead is in the constraint check)
> >
> > "
> >
> > This is from 2008 and looks like at that time those folks where already having performance issues with partitions.
> >
> > Going to copy some folks from the old thread, hopefully 4 years later they may have found a solution. Maybe they've moved on into something more exciting, maybe He is in another world where we don't have database servers. In special the brave Emmanuel for posting his trigger code that I will hack into my own :P
> > Thanks Emmanuel.
> >
> >
> >
> >
> >
> >
> > ----------------------------------------
> >> Subject: Re:  Performance on Bulk Insert to Partitioned Table
> >> From: itparanoia@xxxxxxxxx
> >> Date: Mon, 24 Dec 2012 21:11:07 +0400
> >> CC: jeff.janes@xxxxxxxxx; ondrej.ivanic@xxxxxxxxx; pgsql-performance@xxxxxxxxxxxxxx
> >> To: charlesrg@xxxxxxxxxxx
> >>
> >>
> >> On Dec 24, 2012, at 9:07 PM, Charles Gomes <charlesrg@xxxxxxxxxxx> wrote:
> >>
> >>> By the way, I've just re-wrote the code to target the partitions individually and I've got almost 4 times improvement.
> >>> Shouldn't it be faster to process the trigger, I would understand if there was no CPU left, but there is lots of cpu to chew.
> >> I saw your 20% idle cpu and raise eyebrows.
> >>
> >>> It seems that there will be no other way to speedup unless the insert code is partition aware.
> >>>
> >>> ----------------------------------------
> >>>> From: charlesrg@xxxxxxxxxxx
> >>>> To: jeff.janes@xxxxxxxxx
> >>>> CC: ondrej.ivanic@xxxxxxxxx; pgsql-performance@xxxxxxxxxxxxxx
> >>>> Subject: Re:  Performance on Bulk Insert to Partitioned Table
> >>>> Date: Mon, 24 Dec 2012 10:51:12 -0500
> >>>>
> >>>> ________________________________
> >>>>> Date: Sun, 23 Dec 2012 14:55:16 -0800
> >>>>> Subject: Re:  Performance on Bulk Insert to Partitioned Table
> >>>>> From: jeff.janes@xxxxxxxxx
> >>>>> To: charlesrg@xxxxxxxxxxx
> >>>>> CC: ondrej.ivanic@xxxxxxxxx; pgsql-performance@xxxxxxxxxxxxxx
> >>>>>
> >>>>> On Thursday, December 20, 2012, Charles Gomes wrote:
> >>>>> True, that's the same I feel, I will be looking to translate the
> >>>>> trigger to C if I can find good examples, that should accelerate.
> >>>>>
> >>>>> I think your performance bottleneck is almost certainly the dynamic
> >>>>> SQL. Using C to generate that dynamic SQL isn't going to help much,
> >>>>> because it is still the SQL engine that has to parse, plan, and execute
> >>>>> it.
> >>>>>
> >>>>> Are the vast majority if your inserts done on any given day for records
> >>>>> from that same day or the one before; or are they evenly spread over
> >>>>> the preceding year? If the former, you could use static SQL in IF and
> >>>>> ELSIF for those days, and fall back on the dynamic SQL for the
> >>>>> exceptions in the ELSE block. Of course that means you have to update
> >>>>> the trigger every day.
> >>>>>
> >>>>>
> >>>>> Using rules would be totally bad as I'm partitioning daily and after
> >>>>> one year having 365 lines of IF won't be fun to maintain.
> >>>>>
> >>>>> Maintaining 365 lines of IF is what Perl was invented for. That goes
> >>>>> for triggers w/ static SQL as well as for rules.
> >>>>>
> >>>>> If you do the static SQL in a trigger and the dates of the records are
> >>>>> evenly scattered over the preceding year, make sure your IFs are nested
> >>>>> like a binary search, not a linear search. And if they are mostly for
> >>>>> "today's" date, then make sure you search backwards.
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Jeff
> >>>> Jeff, I've changed the code from dynamic to:
> >>>>
> >>>> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
> >>>> RETURNS trigger AS $$
> >>>> DECLARE
> >>>> r_date text;
> >>>> BEGIN
> >>>> r_date = to_char(new.received_time, 'YYYY_MM_DD');
> >>>> case r_date
> >>>> when '2012_09_10' then
> >>>> insert into quotes_2012_09_10 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_11' then
> >>>> insert into quotes_2012_09_11 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_12' then
> >>>> insert into quotes_2012_09_12 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_13' then
> >>>> insert into quotes_2012_09_13 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_14' then
> >>>> insert into quotes_2012_09_14 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_15' then
> >>>> insert into quotes_2012_09_15 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_16' then
> >>>> insert into quotes_2012_09_16 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_17' then
> >>>> insert into quotes_2012_09_17 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_18' then
> >>>> insert into quotes_2012_09_18 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_19' then
> >>>> insert into quotes_2012_09_19 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_20' then
> >>>> insert into quotes_2012_09_20 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_21' then
> >>>> insert into quotes_2012_09_21 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_22' then
> >>>> insert into quotes_2012_09_22 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_23' then
> >>>> insert into quotes_2012_09_23 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_24' then
> >>>> insert into quotes_2012_09_24 values (NEW.*) using new;
> >>>> return;
> >>>> end case
> >>>> RETURN NULL;
> >>>> END;
> >>>> $$
> >>>> LANGUAGE plpgsql;
> >>>>
> >>>>
> >>>> However I've got no speed improvement.
> >>>> I need to keep two weeks worth of partitions at a time, that's why all the WHEN statements.
> >>>> Wish postgres could automate the partition process natively like the other sql db.
> >>>>
> >>>> Thank you guys for your help.
> >>>>
> >>>> --
> >>>> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> >>>> To make changes to your subscription:
> >>>> http://www.postgresql.org/mailpref/pgsql-performance
> >>> --
> >>> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> >>> To make changes to your subscription:
> >>> http://www.postgresql.org/mailpref/pgsql-performance
> >>
> >>
> >> --
> >> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
> 		 	   		  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux