This is a problem I encounter constantly wherever I go. Programmer
selects millions of rows from giant table. Programmer loops through
results one by one doing some magic on them. Programmer submits queries
back to the database. Even in batches, that's going to take ages.
Reminds me of a recent question on stackoverflow :
http://stackoverflow.com/questions/5952020/how-to-optimize-painfully-slow-mysql-query-that-finds-correlations
And the answer :
http://stackoverflow.com/questions/5952020/how-to-optimize-painfully-slow-mysql-query-that-finds-correlations/5954041#5954041
OP was thinking "row-based", with subqueries in the role of "doing some
magicm".
Using a set-based solution with cascading WITH CTEs (and using the
previous CTE as a source in the next one for aggregation) => 100x speedup !
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance