Search Postgresql Archives

Re: Selecting K random rows - efficiently!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



All you're doing is picking random =subsequences= from the same permutation of the original data.

You have some good points in your reply. I am very much aware of this non-random behavior you point out for the "static random-value column" approach but at least it is fast, which is a requirement. :-( However, if the life time of the individual rows are short, the behaviour is, luckily, sufficiently random for my specific purpose.

I furthermore realize that the only way to get truly random samples is to ORDER BY random(), but this is an unacceptable slow method for large data sets. Even though it is not trivial at all, there ARE indeed algorithms out there [1,2] for picking random sub sets from a result set but these are (sadly) not implemented in postgresql.


References:
[1] http://portal.acm.org/citation.cfm?id=304206

[2] http://compstat.chonbuk.ac.kr/Sisyphus/CurrentStudy/Sampling/vldb86.pdf

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux