Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Wed, 22 Mar 2023 15:45:12 -0700

On 3/22/23 12:09, Bryn Llewellyn wrote:
laurenz.albe@xxxxxxxxxxx <mailto:laurenz.albe@xxxxxxxxxxx> wrote:

...I understand that you ask questions to gain deeper understanding.

bryn@xxxxxxxxxxxx <mailto:bryn@xxxxxxxxxxxx> wrote:

...I had never come across use cases where [scrollability] was 
beneficial. I wanted, therefore, to hear about some. I thought that 
insights here would help me understand the mechanics.

I recently used cursor scrollability, so I can show you a use case:

github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49 <http://github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49>

Thanks for the link to your SQL file at the line where you get the row 
count in the way that you describe. I saw that this is in the PL/pgSQL 
source text for function "materialize_foreign_table()" (~200 lines). And 
I saw that you use the cursor mechanism that we're discussing here in 
only one other function, "db_migrate_refresh()" (~480 lines). But this 
second one doesn't use the "move forward all" device to get a row count. 
I looked only at these two functions.

I noted that neither of these functions has a refcursor formal argument 
and that, rather, you open (i.e. create) each of the three cursors that 
you use within the two functions that uses them. I noted, too,  that for 
the three "select" statements that you use to open your refcursors, none 
of these has an "order by". I noted that your code implements "create 
table destination" and "insert into destination... select 
from source..." where order doesn't matter.

CREATE FUNCTION materialize_foreign_table(
   schema name,
   table_name name,
   with_data boolean DEFAULT TRUE,
   pgstage_schema name DEFAULT NAME 'pgsql_stage'
) RETURNS boolean
   LANGUAGE plpgsql VOLATILE STRICT SET search_path = pg_catalog AS
$$DECLARE
   ft                 name;
   stmt               text;
   errmsg             text;
   detail             text;
   cur_partitions     refcursor;
   cur_subpartitions  refcursor;

...

CREATE FUNCTION db_migrate_refresh(
   plugin         name,
   staging_schema name    DEFAULT NAME 'fdw_stage',
   pgstage_schema name    DEFAULT NAME 'pgsql_stage',
   only_schemas   name[]  DEFAULT NULL
) RETURNS integer
   LANGUAGE plpgsql VOLATILE CALLED ON NULL INPUT SET search_path = 
pg_catalog AS
$$DECLARE
   extschema               text;
   old_msglevel            text;
   v_plugin_schema         text;
   v_create_metadata_views regproc;
   v_translate_datatype    regproc;
   v_translate_identifier  regproc;
   v_translate_expression  regproc;
   c_col                   refcursor

...

What is not formal about the above?

Though it does not matter as, back to the docs again:

https://www.postgresql.org/docs/current/plpgsql-cursors.html

"All access to cursors in PL/pgSQL goes through cursor variables, which 
are always of the special data type refcursor. One way to create a 
cursor variable is just to declare it as a variable of type refcursor. 
Another way is to use the cursor declaration syntax, which in general is:

name [ [ NO ] SCROLL ] CURSOR [ ( arguments ) ] FOR query;
"

Though maybe you are trying to differentiate between bound and unbound 
refcursor's, where the ones defined in the functions are unbound.

Also order by is not relevant for getting a count.

Finally, I see how an understanding of internals helps the understanding 
of performance-related question. But I find it very hard to accept that 
I should read the C implementation of PostgreSQL in order to get the 
proper mental model, and the proper terms of art, that I need to 
understand it semantics (and the corresponding SQL and PL/pgSQL syntax). 
Having said that, I did a little test an saw that this:

move last in cur;
get diagnostics n = row_count;

reported just 1—in contrast to what you used:

move forward all in cur;

Because they are different actions.

The first directly moves to the last row in the cursor and counts just 
that row

The second moves through all the rows left in the cursor and counts all 
the rows it traversed.

It is spelled out here:

https://www.postgresql.org/docs/current/sql-move.html

"The count is the number of rows that a FETCH command with the same 
parameters would have returned (possibly zero)."

I've no idea how it's possible to navigate to the last result in the set 
without knowing how many there are. Maybe that fact is there 
internally—but with no explicit SQL or PL/pgSQL exposure of the value.

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx