On 12/7/09 11:12 AM, "Ben Brehmer" <benbrehmer@xxxxxxxxx> wrote: > Thanks for the quick responses. I will respond to all questions in one email: > > COPY command: Unfortunately I'm stuck with INSERTS due to the nature this data > was generated (Hadoop/MapReduce). If you have control over the MapReduce output, you can have that output result files in a format that COPY likes. If you don't have any control over that its more complicated. I use a final pass Hadoop Map only job to go over the output and insert into postgres directly from the job, using the : INSERT INTO <table> VALUES (val1, val2, ... ), (val1, val2, ...), ... Insert style from Java with about 80 rows per insert statement and a single transaction for about a thousand of these. This was faster than batch inserts . > > On 07/12/2009 10:39 AM, Thom Brown wrote: >> >> 2009/12/7 Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx> >> >>> >>> Ben Brehmer <benbrehmer@xxxxxxxxx> wrote: >>> >>>> -7.5 GB memory >>>> -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units >>>> each) >>>> -64-bit platform >>> >>> >>> What OS? >>> >>>> (PostgreSQL 8.1.3) >>> >>> Why use such an antiquated, buggy version? Newer versions are >>> faster. >>> >>> -Kevin >>> >> >> >> >> >> >> >> I'd agree with trying to use the latest version you can. >> >> >> >> >> How are you loading this data? I'd make sure you haven't got any indices, >> primary keys, triggers or constraints on your tables before you begin the >> initial load, just add them after. Also use either the COPY command for >> loading, or prepared transactions. Individual insert commands will just take >> way too long. >> >> >> >> >> Regards >> >> >> >> >> Thom > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance