Re: Correct use of cursors for very large result sets in Postgres

John Gorman <jgorman@xxxxxxxxxxxx> · Thu, 23 Feb 2017 12:47:09 +0000

Yes of course that’s all verified and taken into account during code initialization

From: Vitalii Tymchyshyn [mailto:vit@xxxxxx]

Sent: Wednesday, February 22, 2017 8:14 PM

To: John Gorman; pgsql-performance@xxxxxxxxxxxxxx

Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

For JDBC there are certain prerequisites for setFetchSize to work, e.g. using forward only result sets and transactions.

вт, 21 лют. 2017 о 09:06 John Gorman <jgorman@xxxxxxxxxxxx> пише:

My experience with cursors in PostgreSQL with Java has been to stay away from them. We support
 2 databases with our product, PostgreSQL (default) and SQL Server. While re-encrypting data in a database the application used cursors with a fetch size of 1000.

Worked perfectly on SQL Server and on PostgreSQL until we got to a PostgreSQL table with
 more than 11 million rows. After spending weeks trying to figure out what was happening, I realized that when it gets to a table with more than 10 million rows for some reason, the cursor functionality just silently stopped working and it was reading the entire
 table. I asked another very senior architect to look at it and he came to the same conclusion. Because of limited time, I ended up working around it using limit/offset.

Again we are using Java, so the problem could just be in the PostgreSQL JDBC driver. Also
 we were on 9.1 at the time.

Regards
John

From:
pgsql-performance-owner@xxxxxxxxxxxxxx [mailto:pgsql-performance-owner@xxxxxxxxxxxxxx]
On Behalf Of Mike Beaton

Sent: Tuesday, February 21, 2017 6:49 AM

To: 
pgsql-performance@xxxxxxxxxxxxxx

Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

Thanks, Tom.

Wouldn't this mean that cursors are noticeably non-optimal even for normal data sizes, since the entire data to be streamed from the table is always duplicated into another buffer and then streamed?

> if you want the whole query result at once, why are you bothering with a cursor?

The PostgreSQL docs (https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551) clearly
 recommend cursors as a way to return a reference to a large result set from a function (as I understood, this is recommended precisely as a way to avoid tuple-based buffering of the data).

So following that advice, it's not unreasonable that I would actually have a cursor to a large dataset.

Then, I would ideally want to be able to fetch the data from that cursor without the entire data getting duplicated (even if
 only a bit at a time instead of all at once, which seems to be the best case behaviour) as I go.

Additionally, I thought that if I had a streaming use-case (which I do), and a streaming data-access layer (which I do), then
 since `SELECT * FROM large` is absolutely fine, end-to-end, in that situation, then by symmetry and the principle of least astonishment `FETCH ALL FROM cursor` might be fine too.