PG & random() strangeness

"Sergey E. Koposov" <math@xxxxxxxxxx> · Tue, 4 May 2010 18:36:23 +0400 (MSD)

Hello,

I'm getting strange results with PostgreSQL random() function. It would be 
great if someone could either show where I am wrong or PG is wrong. Here 
is what I do (PG 8.4.3, x86_64 platform);

I basically try to create the table with the column filled with random 
numbers (either integer or doubles). And I'm getting too many collisions.

Here is the SQL code:
-----------------------

begin;
select setseed(0);
create temporary table tmpx1 as select ((random())) as id from
	generate_series(1,100000);
select id ,count(*)  from tmpx1 group by (id) having count(*)>1;

drop table tmpx1;

select setseed(0);
create temporary table tmpx2 as select ((random())*9223372036854775806)::bigint as id from
	generate_series(1,100000);
select id ,count(*)  from tmpx2 group by (id) having count(*)>1;
commit;
----
Here is the output:

------------
BEGIN
 setseed
---------

(1 row)

SELECT
         id         | count
--------------------+-------
  0.519465064629912 |     2
 0.0100625408813357 |     2
  0.394671014975756 |     2
(3 rows)

DROP TABLE
 setseed
---------

(1 row)

SELECT
         id          | count
---------------------+-------
 4791219551230492672 |     2
   92810558184620032 |     2
 3640197603284484096 |     2
(3 rows)

COMMIT
-------------

So among 10^5 random numbers there are already 3 collisions. Which doesn't 
seem right for the function which generate randoms of double precision

It is also interesting that in the table tmpx1 (if I actually output the 
values using select id from tmpx1) I do not see equal numbers at all but I see 
for example two values which are close to each other:
   0.511193082667887
   0.511194097343832

First it seems strange that they have been merged into one group by "groupby". 
Although I understand all standard issues related to the storage of 
floats and their comparison, I don't get why two numbers which 
differ by much less than precision were actually merged into one (given 
the double datatype). Although I'm surprized by that result, I can accept 
it because "groupby" doesn't make too much sense with floats anyway. But 
then I dont understand why am I getting the same problems when try to 
scale my  random number to bigint (the part of the example with tmpx2 
table).

In order to understand what's going on. I also tried to produce a simple 
program which replicates what PG does when generating random numbers 
(from utils/adt/float.c). I also use the same seed=0 to get the same 
numbers.

----------------
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#define MAX_RANDOM_VALUE  (0x7FFFFFFF)

main()
{
  int i;
  int iseed = (int) (0 * MAX_RANDOM_VALUE);
  srandom((unsigned int) iseed);
  for(i=0; i<=100000; i++)
  {
    double result = (double) random() / ((double) MAX_RANDOM_VALUE + 1);
    fprintf(stdout,"%.20f\n",result);
  }
}

--------------

And this program ./a.out | sort -n | uniq -c Doesn't produce any 
duplicates at all. There are two close numbers
    0.511193082667887
    0.511194097343832
exactly the same as what PG sees.

So it doesn't seem like related to some random generator problems. So it
rather seems to me that the floating precision is lost somewhere in PG 
(although I may be wrong).

Does anyone have an explanation for what I see ? Did I miss something 
obvious ?

Thanks in advance,
	Sergey

*******************************************************************
Sergey E. Koposov, PhD
Institute for Astronomy, University of Cambridge, UK
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@xxxxxxxxxx

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general