Re: Proposal to introduce a shuffle function to intarray extension

Martin Kalcher <martin.kalcher@xxxxxxxxxxxxxxx> · Sat, 16 Jul 2022 22:21:54 +0200

Am 16.07.22 um 18:53 schrieb Mladen Gogala:
On 7/15/22 04:36, Martin Kalcher wrote:
Dear list,

i am dealing with an application that processes fairly large arrays of
integers. It makes heavy use of the intarray extension, which works
great in most cases. However, there are two requirements that cannot
be addressed by the extension and are rather slow with plain SQL. Both
can be met with shuffling:

- Taking n random members from an integer array
- Splitting an array into n chunks, where each member is assigned to a
random chunk

Shuffling is currently implemented by unnesting the array, ordering
the members by random() and aggregating them again.

Martin, have you considered PL/Python and NumPy module?

Hey Mladen,

thank you for your advice. Unfortunately the performance of shuffling 
with NumPy is about the same as with SQL.

  create function numpy_shuffle(arr int[])
  returns int[]
  as $$
    import numpy
    numpy.random.shuffle(arr)
    return arr
  $$ language 'plpython3u';

  select arr[1:3]::text || ' ... ' || arr[3999998:4000000]::text
  from (
    select numpy_shuffle(arr) arr from numbers
  ) shuffled;

  -------------------------------------------------------
   {674026,3306457,1727170} ... {343875,3825484,1235246}

  Time: 2315.431 ms (00:02.315)

Am i doing something wrong?

Martin