Joel Reymont <joelr1@xxxxxxxxx> writes: > I'm trying to optimize the following query that performs KL Divergence [1]. As you can see the distance function operates on vectors of 150 floats. > CREATE OR REPLACE FUNCTION docs_within_distance(vec topics, threshold float) > RETURNS TABLE(id doc_id, distance float) AS $$ > BEGIN > RETURN QUERY > SELECT * > FROM (SELECT doc_id, (SELECT sum(vec[i] * ln(vec[i] / topics[i])) > FROM generate_subscripts(topics, 1) AS i > WHERE topics[i] > 0) AS distance > FROM docs) AS tab > WHERE tab.distance <= threshold; > END; > $$ LANGUAGE plpgsql; Yikes. That sub-select is a mighty expensive way to compute the scalar product. Push it into a sub-function that takes the two arrays and iterates over them with a for-loop. For another couple orders of magnitude, convert the sub-function to C code. (I don't think you need a whole data type, just a function that does the scalar product.) regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general