Re: How to run in parallel in Postgres, EXECUTE_PARALLEL

Lars Aksel Opsahl <Lars.Opsahl@xxxxxxxx> · Sun, 8 Dec 2019 21:59:51 +0000

> From: 
Joe Conway
> Sent: Sunday, December 8, 2019 9:04 PM
> To: 
Lars Aksel 
Opsahl; Laurenz 
Albe; pgsql-performance@lists.postgresql.org
> Subject: Re: How to run in parallel in
Postgres, EXECUTE_PARALLEL
> 
> On 12/8/19 1:14 PM,
Lars 
Aksel Opsahl wrote:
> > Do you or anybody know if there are any plans for a function call that
> > support the calling structure below or something like it and that then
> > could finish in 1 second ? (If you are calling a void function, the
> > return value should not be any problem.)
> > 
> > DO
> > $body$
> > *DECLARE* 
> > command_string_list text[3];
> > *BEGIN*
> > command_string_list[0] = 'SELECT pg_sleep(1)';
> > command_string_list[1] = 'SELECT pg_sleep(1)';
> > command_string_list[2] = 'SELECT pg_sleep(1)';
> > EXECUTE_PARALLEL command_string_list;
> > *END*
> > $body$;
> > 
> > The only way to this today as I understand it, is to open 3 new
> > connections back to the database which you can be done in different ways. 
> 
> Yes, correct.
> 
> > If we had a parallel functions like the one above it's easier to
> > make parallel 
sql without using complex scripts, java, 
python or other
> > system.
> 
> It does require one connection per statement, but with
dblink it is not
> necessarily all that complex. For example (granted, this could use more
> error checking, etc.):
> 
> 8<----------------
> CREATE OR REPLACE FUNCTION
>  execute_parallel(stmts text[])
> RETURNS text AS
> $$
> declare
>   i
int;
>   retv text;
>   conn text;
>   connstr text;
>   rv
int;
>   db text := current_database();
> begin
>   for i in 1..array_length(stmts,1) loop
>     
conn := 'conn' || i::text;
>     
connstr := 'dbname=' ||
db;
>     
perform dblink_connect(conn, 
connstr);
>     
rv := dblink_send_query(conn,
stmts[i]);
>   end loop;
>   for i in 1..array_length(stmts,1) loop
>     
conn := 'conn' || i::text;
>     
select val into 
retv
>     
from dblink_get_result(conn) as d(val text);
>   end loop;
>   for i in 1..array_length(stmts,1) loop
>     
conn := 'conn' || i::text;
>     
perform dblink_disconnect(conn);
>   end loop;
>   return 'OK';
>   end;
> $$ language 
plpgsql;
> 8<----------------
> 
> And then:
> 
> 8<----------------
> \timing
> DO $$
>   declare
>     
stmts text[];
>   begin
>     
stmts[1] = 'select pg_sleep(1)';
>     
stmts[2] = 'select pg_sleep(1)';
>     
stmts[3] = 'select pg_sleep(1)';
>   PERFORM execute_parallel(stmts);
>   end;
> $$ LANGUAGE 
plpgsql;
> DO
> Time: 1010.831 
ms (00:01.011)
> 8<----------------
> 
> HTH,
> 
> 
Joe
> -- 
> Crunchy Data - http://crunchydata.com
> PostgreSQL Support for Secure Enterprises
> Consulting, Training, & Open Source Development

Hi

Thanks a lot it works like a charm. https://github.com/larsop/find-overlap-and-gap/tree/use_dblink_for_parallel
(The test is failing now because it seems like drop EXTENSION dblink; is not cleaning up every thing)

As you say we need some error handling. And maybe  some retry if not enough free connections and a parameter
 for max parallel connections and so on.

So far this is best solution I have seen.

Thanks.

Lars