How important is true randomness?
The goal is an even distribution but currently I have not seen any way
to produce any kind of random sampling efficiently. Notice the word
"efficiently". The naive way of taking a random sample of size K:
(SELECT * FROM mydata ORDER BY random() LIMIT <K>)
is clearly not an option for performance reasons. It shouldn't be
necessary to explain why. :-)
Search the archives, there have been solutions proposed before, though
they probably arn't very quick...
As the subject suggests, performance really matters and searching the
archives only results in poor solutions (my first post explains why).
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
message can get through to the mailing list cleanly