On Sat, Apr 30, 2011 at 2:15 PM, Joel Reymont <joelr1@xxxxxxxxx> wrote: > > On Apr 30, 2011, at 7:24 PM, Kevin Grittner wrote: > >> If this is where most of the time is, the next thing is to run it >> with EXPLAIN ANALYZE, and post the output. > > I was absolutely wrong about the calculation taking < 1s, it actually takes about 30s for 2 million rows. > > Still, the difference between 5 minutes and 30s must be the insert. But what exactly are you inserting? The queries you reported below are not the same as the ones you originally described. In particular, they do not seem to use the "threshold" parameter that the original ones did, whose job is presumably to cut the 2 million down to a much smaller number that meet the threshold. But how much smaller is that number? This will have a large effect on how long the insert takes. ... > Digging deeper into the distance function, > > EXPLAIN ANALYZE VERBOSE > SELECT * > FROM (SELECT id, divergence(<array above>, topics) AS distance FROM docs) AS tab > WHERE tab.distance <= 50.0; > > Subquery Scan on tab (cost=0.00..383333.00 rows=666653 width=12) (actual time=0.027..20429.299 rows=2000002 loops=1) > Output: tab.id, tab.distance > Filter: (tab.distance <= 50::double precision) > -> Seq Scan on public.docs (cost=0.00..358333.50 rows=1999960 width=36) (actual time=0.025..19908.200 rows=2000002 loops=1) > Output: docs.id, divergence((<array above>::double precision[])::topics, docs.topics) It looks like "WHERE tab.distance <= 50.0;" is not accomplishing anything. Are you sure the parameter shouldn't be <=0.50 instead? Also, you previously said you didn't mind of this process took a couple minutes, as long as it didn't interfere with other things going on in the database. So you probably need to describe what those other things going on in the database are. Also, you might have a data correctness problem. If the plan is to scan new ads against all docs, and new docs against all ads; then if new rows are added to each table during overlapping transaction, the new ads against new docs comparison will not actually happen. You will probably need to add manual locking to get around this problem. Cheers Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance