Re: Performance on Bulk Insert to Partitioned Table

Charles Gomes <charlesrg@xxxxxxxxxxx> · Fri, 21 Dec 2012 09:30:07 -0500

The BBU does combines the writes.

I've benchmarked using a single table and it took 1:34:21.549959 to insert 1188000000 rows. (70 writers to a single table)

I've also benchmarked having writers targeting individual partitions and they get the same job done in 1 Hour.

I/O is definitely not the botleneck.

Without changing hardware it accelerates things almost 4 times, looks like to be a delay on the way Postgresql handles the partitions or the time taking for the trigger to select what partition to insert.

When targeting I issue commands that insert directly into the partition "INSERT INTO quotes_DATE VALUES() ..,..,...,.., " 10k rows at time.
When not targeting I leave to the trigger to decide:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()RETURNS trigger AS $$

DECLARE

tablename varchar(24);

bdate varchar(10);

edate varchar(10);

BEGIN

tablename = 'quotes_' || to_char(new.received_time,'YYYY_MM_DD');

EXECUTE 'INSERT INTO '|| tablename ||' VALUES (($1).*)'
USING NEW ;

RETURN NULL;

END;

$$

LANGUAGE plpgsql;

Maybe translating this trigger to C could help. But I haven't heart anyone that did use partitioning with a trigger in C and I don't have the know how on it without examples.

________________________________
> Date: Thu, 20 Dec 2012 19:24:09 -0800 
> Subject: Re:  Performance on Bulk Insert to Partitioned Table 
> From: jeff.janes@xxxxxxxxx 
> To: charlesrg@xxxxxxxxxxx 
> CC: pgsql-performance@xxxxxxxxxxxxxx 
>  
>  
>  
> On Thursday, December 20, 2012, Charles Gomes wrote: 
> Jeff, 
>  
> The 8288 writes are fine, as the array has a BBU, it's fine. You see  
> about 4% of the utilization. 
>  
> BBU is great for latency, but it doesn't do much for throughput, unless  
> it is doing write combining behind the scenes.  Is it HDD or SSD behind  
> the BBU?  Have you bench-marked it on randomly scattered 8k writes? 
>  
> I've seen %util reports that were low while watching a strace showed  
> obvious IO freezes.  So I don't know how much faith to put into low  
> %util. 
>  
>  
>  
> To target directly instead of doing : 
> INSERT INTO TABLE VALUES () 
> I use: 
> INSERT INTO TABLE_PARTITION_01 VALUES() 
>  
> But how is it deciding what partition to use?  Does it have to  
> re-decide for every row, or does each thread serve only one partition  
> throughout its life and so makes the decision only once? 
>  
>  
>  
> By targeting it I see a huge performance increase. 
>  
> But is that because by targeting you are by-passing the the over-head  
> of triggers, or is it because you are loading the rows in an order  
> which leads to more efficient index maintenance? 
>  
>  
> I haven't tested using 1Billion rows in a single table. The issue is  
> that in the future it will grow to more than 1 billion rows, it will  
> get to about 4Billion rows and that's when I believe partition would be  
> a major improvement. 
>  
> The way that partitioning gives you performance improvements is by you  
> embracing the partitioning, for example by targeting the loading to  
> just one partition without any indexes, creating indexes, and then  
> atomically attaching it to the table.  If you wish to have partitions,  
> but want to use triggers to hide that partitioning from you, then I  
> don't think you can expect to get much of a speed up through using  
> partitions. 
>  
> Any way, the way I would approach it would be to load to a single  
> un-partitioned table, and also load to a single dummy-partitioned table  
> which uses a trigger that looks like the one you want to use for real,  
> but directs all rows to a single partition.  If these loads take the  
> same time, you know it is not the trigger which is limiting. 
>  
> Cheers, 
>  
> Jeff 		 	   		  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance