Re: Load experimentation

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Thu, 10 Dec 2009 15:29:59 -0800

On 12/7/09 11:12 AM, "Ben Brehmer" <benbrehmer@xxxxxxxxx> wrote:

> Thanks for the quick responses. I will respond to all questions in one email:
> 
> COPY command: Unfortunately I'm stuck with INSERTS due to the nature this data
> was generated (Hadoop/MapReduce).

If you have control over the MapReduce output, you can have that output
result files in a format that COPY likes.

If you don't have any control over that its more complicated.  I use a final
pass Hadoop Map only job to go over the output and insert into postgres
directly from the job, using the :

INSERT INTO <table> VALUES (val1, val2, ... ), (val1, val2, ...), ...
Insert style from Java with about 80 rows per insert statement and a single
transaction for about a thousand of these.  This was faster than batch
inserts .

> 
> On 07/12/2009 10:39 AM, Thom Brown wrote:
>>  
>> 2009/12/7 Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx>
>>  
>>>  
>>> Ben Brehmer <benbrehmer@xxxxxxxxx> wrote:
>>>  
>>>> -7.5 GB memory
>>>> -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units
>>>>    each)
>>>> -64-bit platform
>>>  
>>>  
>>> What OS?
>>>  
>>>> (PostgreSQL 8.1.3)
>>>  
>>> Why use such an antiquated, buggy version?  Newer versions are
>>> faster.
>>>  
>>> -Kevin
>>>  
>>  
>>  
>>  
>> 
>>  
>>  
>> I'd agree with trying to use the latest version you can.
>>  
>> 
>>  
>>  
>> How are you loading this data?  I'd make sure you haven't got any indices,
>> primary keys, triggers or constraints on your tables before you begin the
>> initial load, just add them after.  Also use either the COPY command for
>> loading, or prepared transactions.  Individual insert commands will just take
>> way too long.
>>  
>> 
>>  
>>  
>> Regards
>>  
>> 
>>  
>>  
>> Thom
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance