Search Postgresql Archives

PG & random() strangeness

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I'm getting strange results with PostgreSQL random() function. It would be great if someone could either show where I am wrong or PG is wrong. Here is what I do (PG 8.4.3, x86_64 platform);

I basically try to create the table with the column filled with random numbers (either integer or doubles). And I'm getting too many collisions.

Here is the SQL code:
-----------------------

begin;
select setseed(0);
create temporary table tmpx1 as select ((random())) as id from
	generate_series(1,100000);
select id ,count(*)  from tmpx1 group by (id) having count(*)>1;

drop table tmpx1;

select setseed(0);
create temporary table tmpx2 as select ((random())*9223372036854775806)::bigint as id from
	generate_series(1,100000);
select id ,count(*)  from tmpx2 group by (id) having count(*)>1;
commit;
----
Here is the output:

------------
BEGIN
 setseed
---------

(1 row)

SELECT
         id         | count
--------------------+-------
  0.519465064629912 |     2
 0.0100625408813357 |     2
  0.394671014975756 |     2
(3 rows)

DROP TABLE
 setseed
---------

(1 row)

SELECT
         id          | count
---------------------+-------
 4791219551230492672 |     2
   92810558184620032 |     2
 3640197603284484096 |     2
(3 rows)

COMMIT
-------------

So among 10^5 random numbers there are already 3 collisions. Which doesn't seem right for the function which generate randoms of double precision

It is also interesting that in the table tmpx1 (if I actually output the values using select id from tmpx1) I do not see equal numbers at all but I see for example two values which are close to each other:
   0.511193082667887
   0.511194097343832

First it seems strange that they have been merged into one group by "groupby". Although I understand all standard issues related to the storage of floats and their comparison, I don't get why two numbers which differ by much less than precision were actually merged into one (given the double datatype). Although I'm surprized by that result, I can accept it because "groupby" doesn't make too much sense with floats anyway. But then I dont understand why am I getting the same problems when try to scale my random number to bigint (the part of the example with tmpx2 table).

In order to understand what's going on. I also tried to produce a simple program which replicates what PG does when generating random numbers (from utils/adt/float.c). I also use the same seed=0 to get the same numbers.

----------------
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
#define MAX_RANDOM_VALUE  (0x7FFFFFFF)

main()
{
  int i;
  int iseed = (int) (0 * MAX_RANDOM_VALUE);
  srandom((unsigned int) iseed);
  for(i=0; i<=100000; i++)
  {
    double result = (double) random() / ((double) MAX_RANDOM_VALUE + 1);
    fprintf(stdout,"%.20f\n",result);
  }
}

--------------

And this program ./a.out | sort -n | uniq -c Doesn't produce any duplicates at all. There are two close numbers
    0.511193082667887
    0.511194097343832
exactly the same as what PG sees.

So it doesn't seem like related to some random generator problems. So it
rather seems to me that the floating precision is lost somewhere in PG (although I may be wrong).

Does anyone have an explanation for what I see ? Did I miss something obvious ?

Thanks in advance,
	Sergey

*******************************************************************
Sergey E. Koposov, PhD
Institute for Astronomy, University of Cambridge, UK
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@xxxxxxxxxx

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux