Am 16.07.22 um 18:53 schrieb Mladen Gogala:
On 7/15/22 04:36, Martin Kalcher wrote:
Dear list,
i am dealing with an application that processes fairly large arrays of
integers. It makes heavy use of the intarray extension, which works
great in most cases. However, there are two requirements that cannot
be addressed by the extension and are rather slow with plain SQL. Both
can be met with shuffling:
- Taking n random members from an integer array
- Splitting an array into n chunks, where each member is assigned to a
random chunk
Shuffling is currently implemented by unnesting the array, ordering
the members by random() and aggregating them again.
Martin, have you considered PL/Python and NumPy module?
Hey Mladen,
thank you for your advice. Unfortunately the performance of shuffling
with NumPy is about the same as with SQL.
create function numpy_shuffle(arr int[])
returns int[]
as $$
import numpy
numpy.random.shuffle(arr)
return arr
$$ language 'plpython3u';
select arr[1:3]::text || ' ... ' || arr[3999998:4000000]::text
from (
select numpy_shuffle(arr) arr from numbers
) shuffled;
-------------------------------------------------------
{674026,3306457,1727170} ... {343875,3825484,1235246}
Time: 2315.431 ms (00:02.315)
Am i doing something wrong?
Martin