"Druckenmueller, Marc" <marc.druckenmueller@xxxxxxxxxxx> writes: > I am investigating possible throughput with PostgreSQL 14.4 on an ARM i.MX6 Quad CPU (NXP sabre board). > Testing with a simple python script (running on the same CPU), I get ~1000 request/s. That does seem pretty awful for modern hardware, but it's hard to tease apart the various potential causes. How beefy is that CPU really? Maybe the overhead is all down to client/server network round trips? Maybe psycopg is doing something unnecessarily inefficient? For comparison, on my development workstation I get [ create the procedure manually in db test ] $ cat bench.sql call dummy_call(1,2,3,array[1,2,3]::float8[]); $ pgbench -f bench.sql -n -T 10 test pgbench (16beta1) transaction type: bench.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 353891 number of failed transactions: 0 (0.000%) latency average = 0.028 ms initial connection time = 7.686 ms tps = 35416.189844 (without initial connection time) and it'd be more if I weren't using an assertions-enabled debug build. It would be interesting to see what you get from exactly that test case on your ARM board. BTW, one thing I see that's definitely an avoidable inefficiency in your test is that you're forcing the array parameter to real[] (i.e. float4) when the procedure takes double precision[] (i.e. float8). That forces an extra run-time conversion. Swapping between float4 and float8 in my pgbench test doesn't move the needle a lot, but it's noticeable. Another thing to think about is that psycopg might be defaulting to a TCP rather than Unix-socket connection, and that might add overhead depending on what kernel you're using. Although, rather than try to micro-optimize that, you probably ought to be thinking of how to remove network round trips altogether. I can get upwards of 300K calls/second if I push the loop to the server side: test=# \timing Timing is on. test=# do $$ declare x int := 1; a float8[] := array[1,2,3]; begin for i in 1..1000000 loop call dummy_call (x,x,x,a); end loop; end $$; DO Time: 3256.023 ms (00:03.256) test=# select 1000000/3.256023; ?column? --------------------- 307123.137643683721 (1 row) Again, it would be interesting to compare exactly that test case on your ARM board. regards, tom lane