Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

Bryn Llewellyn <bryn@xxxxxxxxxxxx> · Wed, 22 Mar 2023 12:09:03 -0700

laurenz.albe@xxxxxxxxxxx wrote:

...I understand that you ask questions to gain deeper understanding.

bryn@xxxxxxxxxxxx wrote:
...I had never come across use cases where [scrollability] was beneficial. I wanted, therefore, to hear about some. I thought that insights here would help me understand the mechanics.

I recently used cursor scrollability, so I can show you a use case:

github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49

The goal is to get the query result count right away, without having to run a second query for it: you declare the cursor, move to the end of the result set, fetch the ROW_COUNT, then move back to the beginning of the result set and start fetching the result rows.

...I personally find that reading the PostgreSQL documentation gets you far, but only so far: for deep understanding, you have to read the code. It is usually well
documented and readable, and I have come to see it as an extension of the documentation that covers the details.

Thanks for the link to your SQL file at the line where you get the row count in the way that you describe. I saw that this is in the PL/pgSQL source text for function "materialize_foreign_table()" (~200 lines). And I saw that you use the cursor mechanism that we're discussing here in only one other function, "db_migrate_refresh()" (~480 lines). But this second one doesn't use the "move forward all" device to get a row count. I looked only at these two functions.

I noted that neither of these functions has a refcursor formal argument and that, rather, you open (i.e. create) each of the three cursors that you use within the two functions that uses them. I noted, too,  that for the three "select" statements that you use to open your refcursors, none of these has an "order by". I noted that your code implements "create table destination" and "insert into destination... select from source..." where order doesn't matter. 

However, source code famously reveals only what it does and not what the author's intention, and overall design philosophy, is. I looked at the README accounts for these two functions here:

github.com/cybertec-postgresql/db_migrator#materialize_foreign_table
github.com/cybertec-postgresql/db_migrator#db_migrate_refresh

But these accounts say nothing about the design of their implementations. The accounts, and the bigger picture, seem to imply that read consistency in the presence of concurrent activity from other sessions is not a concern. I'm guessing that this is accounted for upstream from how the code that I'm looking at operates—i.e. that the source database is extracted to staging tables like, say, export does so that your code operates as the only session that reads from, creates, and populates the tables that it references.

The upshot, therefore, is that I'm afraid that I can only guess at why you use "open, fetch, close" on a refcursor rather than an ordinary cursor for loop. After all, you use the technique only to traverse metadata tables about partitions, subpartitions, and columns. I'd be astonished if such tables have humongous numbers of rows (like hundreds of millions).

As a sanity test, I did this:

create function s.t_count()
  returns int
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  cur refcursor := 'cur';
  n int not null := 0;
begin
  open cur for select v from s.t;
  move forward all in cur;
  get diagnostics n = row_count;
  move absolute 0 in cur;
  return n;
end;
$body$;

Then I timed these two alternatives on a test table with ten million rows.

select count(*) from s.t;
select s.t_count();

They both got the same answer. But my function took about twice as long as the native count(*). Mind you, with "only" 10 million rows (and using a PG cluster in a VM on my Mac Book) the times were tiny: ~600ms versus ~300ms.

Then I tried this:

create procedure s.p(tab in text, mode in text)
  set search_path = pg_catalog, pg_temp
  security definer
  language plpgsql
as $body$
declare
  cur       constant refcursor not null := 'cur';
  stmt      constant text      not null := format('select k, v from s.%I',     tab);
  cnt_stmt  constant text      not null := format('select count(*) from s.%I', tab);

  kk   int not null := 0;
  vv   int not null := 0;
  k    int;
  v    int;
  n    int not null := 0;
  cnt  int not null := 0;
begin
  case mode
    when 'naive' then
      execute cnt_stmt into cnt;
      for kk, vv in execute stmt loop
        n := n + 1;
        assert kk = n and vv = n*2;
      end loop;
      assert n = cnt;

    when 'refcursor' then
      open cur for execute stmt;
      move forward all in cur;
      get diagnostics cnt = row_count;
      move absolute 0 in cur;

      loop
        fetch cur into k, v;
        exit when not found;
        n := n + 1;
        assert k = n and v = n*2;
      end loop;
      close cur; -- Just as a formality.
      assert n = cnt;
  end case;
end;
$body$

I created my 10 million row table like this:

with g(v) as (select generate_series(1, 10*1000*1000))
insert into s.t(k, v) select g.v, (g.v)*2 from g;

And I timed each mode a few times to get a reasonable average. The naive mode took ~7.5 sec; and the refcursor mode took ~13 sec. I don't s'pose that the difference matters much (and anyway, in your use case, all those "create table" and "insert select" statements would dominate the times). But it seems safe to say that the refcursor mode doesn't _help_ performance. Moreover, because the naive mode uses the cursor mechanism under the covers, the memory consumption discussion is the same for both approaches.

I appreciate that, in the face of concurrent sessions making changes to the content of "s.t", and using the default "read committed" isolation level, the before-the-fact separate count(*) might disagree with the after-the-fact value from counting the rows that are traversed. And you need the count at the start of the loop so that you can create the right number of (sub)partitions. So I do see (because other considerations, including code complexity, are so similar) that the refcursor mode is more stylish. Maybe that's it.

Certainly, it was very useful to think about this and to try the tests that I did. So thank you very much for showing me your code.
————————————————————

Finally, I see how an understanding of internals helps the understanding of performance-related question. But I find it very hard to accept that I should read the C implementation of PostgreSQL in order to get the proper mental model, and the proper terms of art, that I need to understand it semantics (and the corresponding SQL and PL/pgSQL syntax). Having said that, I did a little test an saw that this:

move last in cur;
get diagnostics n = row_count;

reported just 1—in contrast to what you used:

move forward all in cur;

I've no idea how it's possible to navigate to the last result in the set without knowing how many there are. Maybe that fact is there internally—but with no explicit SQL or PL/pgSQL exposure of the value.