Search Postgresql Archives

Proposal to introduce a shuffle function to intarray extension

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear list,

i am dealing with an application that processes fairly large arrays of integers. It makes heavy use of the intarray extension, which works great in most cases. However, there are two requirements that cannot be addressed by the extension and are rather slow with plain SQL. Both can be met with shuffling:

- Taking n random members from an integer array
- Splitting an array into n chunks, where each member is assigned to a random chunk

Shuffling is currently implemented by unnesting the array, ordering the members by random() and aggregating them again.


  create table numbers (arr int[]);

  insert into numbers (arr)
  select array_agg(i)
  from generate_series(1, 4000000) i;


  select arr[1:3]::text || ' ... ' || arr[3999998:4000000]::text
  from (
    select array_agg(n order by random()) arr
    from (
      select unnest(arr) n from numbers
    ) plain
  ) shuffled;

  ---------------------------------------------------------
   {2717290,3093757,2426384} ... {3011871,1402540,1613647}

  Time: 2348.961 ms (00:02.349)


I wrote a small extension (see source code below) to see how much we can gain, when the shuffling is implemented in C and the results speak for themselves:


  select arr[1:3]::text || ' ... ' || arr[3999998:4000000]::text
  from (
    select shuffle(arr) arr from numbers
  ) shuffled;

  ----------------------------------------------------
   {1313971,3593627,86630} ... {50764,430410,3901128}

  Time: 132.151 ms


I would like to see a function like this inside the intarray extension. Is there any way to get to this point? How is the process to deal with such proposals?

Best regards,
Martin Kalcher


Source code of extension mentioned above:


#include "postgres.h"
#include "port.h"
#include "utils/array.h"

PG_MODULE_MAGIC;

PG_FUNCTION_INFO_V1(shuffle);

void _shuffle(int32 *a, int len);

Datum
shuffle(PG_FUNCTION_ARGS)
{
  ArrayType  *a = PG_GETARG_ARRAYTYPE_P_COPY(0);

  int len;

  if (array_contains_nulls(a))
    ereport(ERROR,
            (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
             errmsg("array must not contain nulls")));

  len = ArrayGetNItems(ARR_NDIM(a), ARR_DIMS(a));

  if (len > 1)
    _shuffle((int32 *) ARR_DATA_PTR(a), len);

  PG_RETURN_POINTER(a);
}

void
_shuffle(int32 *a, int len) {
  int i, j;
  int32 tmp;

  for (i = len - 1; i > 0; i--) {
    j = random() % (i + 1);
    tmp = a[i];
    a[i] = a[j];
    a[j] = tmp;
  }
}








[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux