Cursor With_Hold Performance Workarounds/Optimization

Leon Winter <winter-pg@xxxxxxxxxxxxx> · Tue, 10 Oct 2017 14:20:39 +0200

Hi,

during implementation of a runtime environment and the adjoining database
abstraction layer I noticed (like many before me [0] and as correctly mentioned
in the documentation) the sizeable performance impact of declaring a cursor
"with hold" for queries with large result sets.

Our use case very often looks like this:

open cursor for select from table1
loop
{ fetch some entries from cursor
  update table2
  commit
}

During iteration of the result set we commit changes to the database so we
must make sure to keep the cursor alive. One option is to use "with hold".
Unfortunately the resultset is then instantly materialzed which is a huge
performance burden.
In our use case the "commit" of changes often does not affect the iteration set.
Also the loop might be aborted before the resultset was fully read so we never
needed the whole materialzed set anyway.
To workaround these problems, we already employ some static analysis to avoid
"with hold" in all situations where there are no commits during the lifetime of
cursor or portal. For other cursors we choose to use a different database
connection inside the same application to protect the cursors from commit
operations and avoiding costly copy operations (if they would be used "with
hold" on the main database connection).
In an attempt to further minimize the performance impact I am thinking about
employing a lazy "with hold" where I would fetch all the remaining result rows
from a cursor or portal before a commit statement. This way I could at least
have great performance in all conflict-free situations until one would arrive at
an impass. Naturally I am now wondering why the postgres cursor/portal is not
also employing the same trick (at least as an option): Postpone materialization
of "with hold" cursors until it is required (like a commit operation is
dispatched).
Probably I am also missing many (internal) aspects but at that point it might be
possible to optimize further. When, for instance, no changes were made to result
set of the "with hold" cursor, it must not be materialized. From a previous
discussions [1] I heard that one can in fact accomplish that by using a different
database connection which is one workaround we are using.
I am not sure whether this kind of workaround/optimization work should be done
in the database abstraction/interface layer or the database itself. Since a lot
of people seem to run into the peformance issue many might profit from some
optimization magic in the database for such use cases. We are very invested in
this performance issue and are happy to resolve it on either level.

Regards,
Leon

[0] https://trac.osgeo.org/qgis/ticket/1175
    https://stackoverflow.com/questions/33635405/postgres-cursor-with-hold
    https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/
[1] https://bytes.com/topic/postgresql/answers/420717-cursors-transactions-why
    http://www.postgresql-archive.org/setFetchSize-td4935215.html

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance