Hello,
I'm getting strange results with PostgreSQL random() function. It would be
great if someone could either show where I am wrong or PG is wrong. Here
is what I do (PG 8.4.3, x86_64 platform);
I basically try to create the table with the column filled with random
numbers (either integer or doubles). And I'm getting too many collisions.
Here is the SQL code:
-----------------------
begin;
select setseed(0);
create temporary table tmpx1 as select ((random())) as id from
generate_series(1,100000);
select id ,count(*) from tmpx1 group by (id) having count(*)>1;
drop table tmpx1;
select setseed(0);
create temporary table tmpx2 as select ((random())*9223372036854775806)::bigint as id from
generate_series(1,100000);
select id ,count(*) from tmpx2 group by (id) having count(*)>1;
commit;
----
Here is the output:
------------
BEGIN
setseed
---------
(1 row)
SELECT
id | count
--------------------+-------
0.519465064629912 | 2
0.0100625408813357 | 2
0.394671014975756 | 2
(3 rows)
DROP TABLE
setseed
---------
(1 row)
SELECT
id | count
---------------------+-------
4791219551230492672 | 2
92810558184620032 | 2
3640197603284484096 | 2
(3 rows)
COMMIT
-------------
So among 10^5 random numbers there are already 3 collisions. Which doesn't
seem right for the function which generate randoms of double precision
It is also interesting that in the table tmpx1 (if I actually output the
values using select id from tmpx1) I do not see equal numbers at all but I see
for example two values which are close to each other:
0.511193082667887
0.511194097343832
First it seems strange that they have been merged into one group by "groupby".
Although I understand all standard issues related to the storage of
floats and their comparison, I don't get why two numbers which
differ by much less than precision were actually merged into one (given
the double datatype). Although I'm surprized by that result, I can accept
it because "groupby" doesn't make too much sense with floats anyway. But
then I dont understand why am I getting the same problems when try to
scale my random number to bigint (the part of the example with tmpx2
table).
In order to understand what's going on. I also tried to produce a simple
program which replicates what PG does when generating random numbers
(from utils/adt/float.c). I also use the same seed=0 to get the same
numbers.
----------------
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#define MAX_RANDOM_VALUE (0x7FFFFFFF)
main()
{
int i;
int iseed = (int) (0 * MAX_RANDOM_VALUE);
srandom((unsigned int) iseed);
for(i=0; i<=100000; i++)
{
double result = (double) random() / ((double) MAX_RANDOM_VALUE + 1);
fprintf(stdout,"%.20f\n",result);
}
}
--------------
And this program ./a.out | sort -n | uniq -c Doesn't produce any
duplicates at all. There are two close numbers
0.511193082667887
0.511194097343832
exactly the same as what PG sees.
So it doesn't seem like related to some random generator problems. So it
rather seems to me that the floating precision is lost somewhere in PG
(although I may be wrong).
Does anyone have an explanation for what I see ? Did I miss something
obvious ?
Thanks in advance,
Sergey
*******************************************************************
Sergey E. Koposov, PhD
Institute for Astronomy, University of Cambridge, UK
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@xxxxxxxxxx
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general