Correct use of cursors for very large result sets in Postgres

Mike Beaton <mjsbeaton@xxxxxxxxx> · Fri, 17 Feb 2017 07:59:10 +0000

**Short version of my question:**

If I hold a cursor reference to an astronomically huge result set in my client code, would it be ridiculous (i.e. completely defeats the point of cursors) to issue "FETCH ALL FROM cursorname" as my next command? Or would this slowly stream the data back to me as I consume it (at least in principle, assuming that I have a well written driver sitting between me and Postgres)?

**More detail**

If I understand things at all correctly, then Postgres cursors are REALLY for dealing with the following problem [even though they can be used (abused?) for other things, such as returning multiple different result sets from one function]:

> Note: The current implementation of RETURN NEXT and RETURN QUERY
> stores the entire result set before returning from the function, as
> discussed above. That means that if a PL/pgSQL function produces a
> very large result set, performance might be poor: data will be written
> to disk to avoid memory exhaustion, but the function itself will not
> return until the entire result set has been generated.

(ref: https://www.postgresql.org/docs/9.6/static/plpgsql-control-structures.html)

But (again if I understand correctly) when you write a function which returns a cursor then the whole query is NOT buffered into memory (and disk) before the user of the function can start to consume anything, but instead the results can be consumed bit by bit. (There is more overhead setting up and using the cursor, but it's worth it to avoid massive buffer allocation for very large result sets.)

(ref: https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551)

I would like to understand how this relates to SELECTS and FETCHES over the wire to a Postgres server.

In all cases, I'm talk about consuming results from client code which is communicating with Postgres on a socket behind the scenes (using the Npgsql library in my case, actually).

Q1: What if I try to execute "SELECT * FROM AstronomicallyLargeTable" as my only command over the wire to Postgres? Will that allocate all the memory for the entire select and then start to send data back to me? Or will it (effectively) generate its own cursor and stream the data back a little at a time (with no huge additional buffer allocation on the server)?

Q2: What if I already have a cursor reference to an astronomically large result set (say because I've already done one round trip, and got back the cursor reference from some function), and then I execute "FETCH ALL FROM cursorname" over the wire to Postgres? Is that stupid, because it will allocate ALL the memory for all the results *on the Postgres server* before sending anything back to me? Or will "FETCH ALL FROM cursorname" actually work as I'd like it to, streaming the data back slowly as I consume it, without any massive buffer allocation happening on the Postgres server?