Search Postgresql Archives

Re: optimizing a cpu-heavy query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Joel Reymont <joelr1@xxxxxxxxx> writes:
> I'm trying to optimize the following query that performs KL Divergence [1]. As you can see the distance function operates on vectors of 150 floats. 

> CREATE OR REPLACE FUNCTION docs_within_distance(vec topics, threshold float) 
> RETURNS TABLE(id doc_id, distance float) AS $$
> BEGIN
> 	RETURN QUERY
>     SELECT * 
>     FROM (SELECT doc_id, (SELECT sum(vec[i] * ln(vec[i] / topics[i])) 
>                       FROM generate_subscripts(topics, 1) AS i
>                       WHERE topics[i] > 0) AS distance
>           FROM docs) AS tab
>     WHERE tab.distance <= threshold;
> END;
> $$ LANGUAGE plpgsql;

Yikes.  That sub-select is a mighty expensive way to compute the scalar
product.  Push it into a sub-function that takes the two arrays and
iterates over them with a for-loop.  For another couple orders of
magnitude, convert the sub-function to C code.  (I don't think you need
a whole data type, just a function that does the scalar product.)

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux