Re: Benchmarking partitioning triggers and rules

Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> · Sun, 12 Apr 2015 19:21:08 -0500

On 4/8/15 4:58 PM, Tim Uckun wrote:
So is there a third and even faster way of doing this?

Please don't top-post.

On Wed, Apr 8, 2015 at 4:08 PM, Adrian Klaver <adrian.klaver@xxxxxxxxxxx
<mailto:adrian.klaver@xxxxxxxxxxx>> wrote:

    On 04/07/2015 07:49 PM, Tim Uckun wrote:

        I understand that there is overhead involved in parsing the
        strings and
        such.  The amount of overhead was surprising to me but that's
        another
        matter.  What I am really puzzled about is the difference
        between the
        statements

        EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
        ($1).*' USING NEW ;

        and

            EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||
        ' VALUES
        (($1).*)' USING NEW ;

    Offhand I would say because in the first case you are doing a SELECT
    and in the second you are just doing a substitution.

        They both do string interpolation but one is significantly
        faster than
        the other.   Is there a third and even faster way?

        I am using RDS so I can't really do stored procs in C.

There are only 3 ways you can "steer" data into the correct partition:

1) Outside the database (ie: the application itself knows what partition 
table to use)
2) Rules
3) Triggers

#1 will almost always be faster (I say almost because if something has 
to decide row-by-row it may be faster for the database to do it).

#2 and #3 depend on the workload. Rules are parsed once PER STATEMENT, 
so if you're using a single INSERT to load a lot of rows they might well 
be faster than triggers. But as mentioned, they're a real PITA to use 
and they don't work at all with COPY.

When it comes to triggers, you will get different performance depending 
on the trigger language used as well as how you write the trigger 
itself. Presumably a trigger function written in C will be faster than 
anything else. I would expect plpgsql to be the next fastest after that, 
but that's just a guess.

As for the difference between SELECT and VALUES above, that's probably 
due to a difference in parsing or in the number of transforms the NEW 
has to go through in the executor. My guess is that because SELECT is 
much more general purpose than VALUES it's both more expensive to parse 
as well as more expensive to execute. If you really want to know for 
certain, connect gdb to a database backend on your laptop/desktop, fire 
off an insert (to load plpgsql into backend memory), set a gdb 
breakpoint on exec_stmt_block(), and see what's different between the 
two use cases.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general