Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

Bryn Llewellyn <bryn@xxxxxxxxxxxx> · Thu, 23 Mar 2023 12:49:34 -0700

laurenz.albe@xxxxxxxxxxx wrote:

bryn@xxxxxxxxxxxx wrote:
laurenz.albe@xxxxxxxxxxx wrote:

I recently used cursor scrollability, so I can show you a use case:

github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49

However, source code famously reveals only what it does and not what the author's intention,
and overall design philosophy, is. I looked at the README accounts for these two functions here:

github.com/cybertec-postgresql/db_migrator#materialize_foreign_table
github.com/cybertec-postgresql/db_migrator#db_migrate_refresh

But these accounts say nothing about the design of their implementations.

The documentation describes the purpose of the function and how to use it, not how I went about implementing it. You are interpreting too much. I just wanted to show you a case where scrollable cursors can be useful.

The upshot, therefore, is that I'm afraid that I can only guess at why you use "open, fetch, close" on a refcursor rather than an ordinary cursor for loop. After all, you use the technique only to traverse metadata tables about partitions, subpartitions, and columns. I'd be astonished if such tables have humongous numbers of rows (like hundreds of millions).

No, I don't expect large result sets, and concurrency won't be a problem. I explained why I used scrollable cursors: to save myself an extra query for the total result set count.

It's only natural to want to understand the design philosophy and things like data volumes and possible concurrency concerns when it's recommended that I look at the 200 line source code for a function. I wanted only to assure you that I'd done my due diligence to get some answers. Thanks for confirming my guess about the data volumes and concurrency concerns. Right, I do see that the approach that you used very naturally expresses what needs to be done.

— Create a txn-duration "pg_cursors" row for the outer "select" of interest.
— Traverse it to get the row count, using "move" so's not to see the rows.
— Fetch the first row and do something special that needs the row count as well as the row.
— Loop over the remaining rows and do more pre-row actions.
— Similar logic in an inner loop for a different "select".
— Close the cursors explicitly.

It seems to me (given what you promised me about data volumes and concurrency) that using explicit "select count(*)" with ordinary loops that have the "select" statements in the "for" header (and that would use the cursor mechanisms under the hood) would work correctly and without harming performance. But I'll agree that such code might seem to be less expressive of its purpose then your approach.

Here is another account of how cursors can be useful:
www.cybertec-postgresql.com/en/pagination-problem-total-result-count/

Thanks! It's a very interesting and helpful read.

You wrote « a WITH HOLD cursor... fetches the complete result set at the end of the transaction and materializes it on the server. »

I checked the current doc for the 'declare' statement. It says:

«
In the current implementation, the rows represented by a held cursor are copied into a temporary file or memory area so that they remain available for subsequent transactions.
»

I tested this with "table t(k int primary key, v int not null)" with 100 Million rows. (I'm using psql's default mode for "AUTOCOMMIT" to ask for the server's behavior "as is"—so no explicit "start transaction" unless I need this. And I'm using a PostgreSQL server in a Ubuntu VM on my Mac Book with a psql client in the native macOS. First I timed a "with hold" cursor creation.

-- Took ~30 sec.
declare "With Hold" scroll cursor with hold for select k, v from t order by k;

Then I timed a "without hold" cursor creation for the same "select":

start transaction;
  -- Took less than ~20 ms
  declare "Without Hold" scroll cursor without hold for select k, v from t;

This certainly bears out what you and the doc said about "with hold".

W.r.t. this thread's "Subject", how do you fetch successive batches from a session duration cursor in successive requests from a browser when these go through a connection pool and the general philosophy is "check out connection, use it, release connection" so that successive browser requests will each use a different session? You refer to this with your « transaction level connection pooling won’t work with WITH HOLD cursors, since they are bound to the database connection ». (In this context, your point about showing an approximate result is well-taken.) Does anybody use interactive client-server apps these days? Or are you thinking of non-interactive report generation running on a client machine dedicated to that purpose?

W.r.t. your « With WITH HOLD cursor pagination, the whole result set is calculated in the beginning, so the result set count comes for free. », I timed this:

-- Took ~ 6.5 sec.  (Doesn't speed up with repeats.)
do $body$
declare
  cur constant refcursor not null := 'With Hold';
  n integer not null := 0;
begin
  move absolute 0 in cur;
  move forward all in cur;
  get diagnostics n = row_count;
  move absolute 0 in cur;
  raise info 'row_count: %', n;
end;
$body$;

And this:

-- Took ~6 sec. Speeds up to ~3 sec with repeats.
select count(*) from s.t;

I don't s'pose that what I saw about "count(*)" getting quicker with repeats matters 'cos you'd do it only once. But I remain to be convinced that there's a performance benefit to be got from "comes for free". On the other hand, I do see that (in the face of concurrent DML to the table of interest from other sessions, there's a clear semantic benefit when you get the count from a "with hold" cursor. This would matter in a use case like the migration one that you showed me where you didn't have the luxury of reading from staging tables in single-user mode.