On Tue, Apr 20, 2010 at 12:20 PM, David Kerr <dmk@xxxxxxxxxxxxxx> wrote: > On Tue, Apr 20, 2010 at 02:12:15PM -0400, Nikolas Everett wrote: > - On Tue, Apr 20, 2010 at 2:03 PM, David Kerr <dmk@xxxxxxxxxxxxxx> wrote: > - > - > that thought occured to me while I was testing this. I ran a vacuumdb -z > - > on my database during the load and it didn't impact performance at all. > - > > - > Incidentally the code is written to work like this : > - > > - > while (read X lines in file){ > - > Process those lines. > - > write lines to DB. > - > } > - > > - > So i would generally expect to get the benefits of the updated staticis > - > once the loop ended. no? (would prepared statements affect that possibly?) > - > > - > Also, while I was debugging the problem, I did load a 2nd file into the DB > - > ontop of one that had been loaded. So the statistics almost certinaly > - > should > - > have been decent at that point. > - > > - > I did turn on log_min_duration_statement but that caused performance to be > - > unbearable, > - > but i could turn it on again if it would help. > - > > - > Dave > - > - > - You can absolutely use copy if you like but you need to use a non-standard > - jdbc driver: kato.iki.fi/sw/db/postgresql/jdbc/copy/. I've used it in the > - past and it worked. > - > - Is the whole thing going in in one transaction? I'm reasonably sure > - statistics aren't kept for uncommited transactions. > - > - For inserts the prepared statements can only help. For selects they can > - hurt because eventually the JDBC driver will turn them into back end > - prepared statements that are only planned once. The price here is that that > - plan may not be the best plan for the data that you throw at it. > - > - What was log_min_duration_statement logging that it killed performance? > - > - --Nik > > Good to know about the jdbc-copy. but this is a huge project and the load is > just one very very tiny component, I don't think we could introduce anything > new to assist that. > > It's not all in one tx. I don't have visibility to the code to determine how > it's broken down, but most likely each while loop is a tx. > > I set it to log all statements (i.e., = 0.). that doubled the load time from > ~15 to ~30 hours. I could, of course, be more granular if it would be helpful. So are you logging to the same drive that has pg_xlog and your data/base directory on this machine? -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance