On 05/10/2011 11:26 PM, Scott Marlowe wrote:
I.e. don't grab 1,000 rows and work on them on the client side and then insert data, do the data mangling in the query in the database. My experience has been that moving things like this into the database can result in performance gains of several factors, taking hour long processes and making them run in minutes.
This is a problem I encounter constantly wherever I go. Programmer selects millions of rows from giant table. Programmer loops through results one by one doing some magic on them. Programmer submits queries back to the database. Even in batches, that's going to take ages.
Databases are beasts at set-based operations. If the programmer can build a temp table of any kind and load that, they can turn their update/insert/whatever into a simple JOIN that runs several orders of magnitude faster. Going the route of parallelism will probably work too, but I doubt it's the right solution in this case.
When there are tables with millions of rows involved, processing 111 per second is a bug. Even with ten perfectly balanced threads, 30 hours only becomes three. On decent hardware, you can probably drop, reload, and index the entire table faster than that.
-- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@xxxxxxxxx ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance