Re: Query planner gaining the ability to replanning after start of query execution.

Oliver Mattos <omattos@xxxxxxxxx> · Mon, 13 Nov 2017 23:20:44 +0000

>  You can't just restart from scratch, because we may already have shipped rows to the client

For v1, replanning wouldn't be an option if rows have already been
shipped, or for DML statements.

> parallel plans and most importantly cursors?
Parallel plans look do-able with the same approach, but cursor use I'd
probably stop replanning as soon as the first row is delivered to the
client, as above.   One could imagine more complex approaches like a
limited size buffer of 'delivered' rows, allowing a new plan to be
selected and the delivered rows excluded from the new plans resultset
via a special extra prepending+dupe filtering execnode.   The memory
and computation costs of that execnode would be factored into the
replanning decision like any other node.

>errors if the physical location of a row correlates strongly with a column
This is my largest concern.  These cases already lead to large errors
currently (SELECT * FROM foo WHERE created_date = today LIMIT 1) might
scan all data, only to find all of today's records in the last
physical block.

It's hard to say if replacing one bad estimate with another will lead
to overall better/worse results...   My hope is that in most cases a
bunch of plans will be tried, all end up with cost estimates revised
up a lot, and then one settled on as rows start getting passed to
upper layers.

>underling node might return a totally inaccurate number of rows for index scans
One might imagine using the last returned row as an extra histogram
point when estimating how many rows are left in an index scan.   That
should at least make the estimate more accurate than it is without
feedback.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance