On Tue, Apr 20, 2010 at 02:12:15PM -0400, Nikolas Everett wrote: - On Tue, Apr 20, 2010 at 2:03 PM, David Kerr <dmk@xxxxxxxxxxxxxx> wrote: - - > that thought occured to me while I was testing this. I ran a vacuumdb -z - > on my database during the load and it didn't impact performance at all. - > - > Incidentally the code is written to work like this : - > - > while (read X lines in file){ - > Process those lines. - > write lines to DB. - > } - > - > So i would generally expect to get the benefits of the updated staticis - > once the loop ended. no? (would prepared statements affect that possibly?) - > - > Also, while I was debugging the problem, I did load a 2nd file into the DB - > ontop of one that had been loaded. So the statistics almost certinaly - > should - > have been decent at that point. - > - > I did turn on log_min_duration_statement but that caused performance to be - > unbearable, - > but i could turn it on again if it would help. - > - > Dave - - - You can absolutely use copy if you like but you need to use a non-standard - jdbc driver: kato.iki.fi/sw/db/postgresql/jdbc/copy/. I've used it in the - past and it worked. - - Is the whole thing going in in one transaction? I'm reasonably sure - statistics aren't kept for uncommited transactions. - - For inserts the prepared statements can only help. For selects they can - hurt because eventually the JDBC driver will turn them into back end - prepared statements that are only planned once. The price here is that that - plan may not be the best plan for the data that you throw at it. - - What was log_min_duration_statement logging that it killed performance? - - --Nik Good to know about the jdbc-copy. but this is a huge project and the load is just one very very tiny component, I don't think we could introduce anything new to assist that. It's not all in one tx. I don't have visibility to the code to determine how it's broken down, but most likely each while loop is a tx. I set it to log all statements (i.e., = 0.). that doubled the load time from ~15 to ~30 hours. I could, of course, be more granular if it would be helpful. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance