hi, i would recommend to convert the input using some scripts into pg_dump format and use small temporary tables without indexes to import into - and after all data in the db you could partition it... you can pre-partition your data using simple grep, this way you can import the data directly into a partitioned scheme kirk Thom Brown wrote: > On 28 March 2010 18:33, Rick Casey <caseyrick@xxxxxxxxx > <mailto:caseyrick@xxxxxxxxx>> wrote: > > After careful research, I would to post the following problem I'm > having with the importing of a large (16Gb) CSV file. Here is brief > synopsis: > - this is running on Postgres (PG) version: PostgreSQL 8.3.9 on > i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Ubuntu > 4.3.2-1ubuntu11) 4.3.2 > - it is running on a Ubuntu (small) server instance at Amazon Web > Services (AWS), with a 320Gb volume mounted for the PG data directory > - the database was created using the partition example in the > documentation, with an insert trigger and a function to direct which > table where records get inserted. > (see below for code on my table and trigger creation) > > After some days of attempting to import the full 16Gb CSV file, I > decided to split the thing up, using the split utility in Linux. > This seemed to improve things; once I had split the CSV files into > about 10Mb size files, I finally got my first successful import of > about 257,000 recs. However, this is going to be a rather labor > intensive process to import the full 16Gb file, if I have to > manually split it up, and import each smaller file separately. > > So, I am wondering if there is any to optimize this process? I have > been using Postgres for several years, but have never had to > partition or optimize it for files of this size until now. > Any comments or suggestions would be most welcomed from this > excellent forum. > > (I might add that I spend several weeks prior to this trying to get > this to work in MySQL, which I finally had to abandon.) > > Sincerely, > Rick > > Details of the code follow: > > Here is the basic COPY command, which I run as the postgres user, to > import the CSV files: > <begin> > COPY allcalls FROM '/data/allcalls-csv/sub3ab' WITH CSV; > > Here is what some sample data look like in the files: > 3153371867,2008-02-04 16:11:00,1009,1,40 > 2125673062,2008-02-04 16:11:00,1009,1,41 > 5183562377,2008-02-04 16:11:00,1009,1,50 > ... > > Here are the basic scripts that created the partition table and > insert trigger: > CREATE TABLE allcalls ( > phonenum bigint, > callstarted timestamp without time zone, > status int, > attempts int, > duration int > ); > CREATE TABLE allcalls_0 ( > CHECK ( phonenum < 1000000000 ) > ) INHERITS (allcalls); > ...(repeat this 9 more times, for 10 subpartition tables) > > CREATE INDEX allcalls_0_phonenum ON allcalls_0 (phonenum); > ..(repeat this 9 more times, for indexes on the 10 subpartition tables) > CREATE OR REPLACE FUNCTION allcalls_insert_trigger() > RETURNS TRIGGER AS $$ > BEGIN > IF ( NEW.phonenum < 1000000000 ) THEN > INSERT INTO allcalls_0 VALUES (NEW.*); > ELSIF ( NEW.phonenum >= 1000000000 AND NEW.phonenum < 2000000000 > ) THEN > INSERT INTO allcalls_1 VALUES (NEW.*); > ...(again, repeat for rest of the parition tables) > > CREATE TRIGGER insert_phonenum_trigger > BEFORE INSERT ON allcalls > FOR EACH ROW EXECUTE PROCEDURE allcalls_insert_trigger(); > > <end> > > > The problem here is that you appear to require an index update, trigger > firing and constraint check for every single row. First thing I'd > suggest is remove the indexes. Apply that after your import, otherwise > it'll have to update the index for every single entry. And the trigger > won't help either. Import into a single table and split it out into > further tables after if required. And finally the constraint should > probably be applied after too, so cull any violating rows after importing. > > Thom -- Nagy Zoltan (kirk) <kirk@xxxxxxxx> -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general