Search Postgresql Archives

Re: Selecting K random rows - efficiently!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 24, 2007 at 10:59:46AM +0200, cluster wrote:
> Another way to look at the problem is: How do I sample a subset of size 
> K efficiently? A query like
> 
>    SAMPLE 1000 OF
>    (SELECT * FROM mydata WHERE <some condition>)

How important is true randomness? To get the best possible distribution
most algorithms require you to either know how many rows there are, or
require you to scan the whole table (or index).

With some simplifying assumptions, you can try extracting them from an
index, with the caveat that if your index is unbalanced in any way, the
selection won't be "random".

> should return 1000 random rows from the select statement so that two 
> consecutive evaluations of the query would only with very little 
> probability return the same 1000 rows.
> (Yes, I know that "SAMPLE 1000 OF" is not valid SQL)

Presumably your table is very much bigger than that, in which I suppose
the not-entirely-random is unlikely to play much of a role.

Search the archives, there have been solutions proposed before, though
they probably arn't very quick...

Have a nice day,
-- 
Martijn van Oosterhout   <kleptog@xxxxxxxxx>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment: signature.asc
Description: Digital signature


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux