Re: Query planner gaining the ability to replanning after start of query execution.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I'd love to have some sort of dynamic query feedback, yet it's very complicated to do it right. I am not convinced that changing the plan during a single execution is the right way to do it, not only because it sounds intrusive to do crazy things in the executor, but also because don't understand why the new plan should be any better than the old one. Can you be more elaborate how you'd want to go about it?

In your example (which presumably just has a single relation), we have no notion of whether the scan returns no rows because we were unlucky, because just the first few pages were empty of matching rows (which in my experience happens more often), or because the cardinality estimation is wrong. Even if the cardinality estimation is wrong, we have no notion of which predicate or predicate combination actually caused the misestimation. If the first few pages where empty, the same might happen with every order (so also with every available indexscan). Imagine a very simple seqscan plan of 
select * from mytab where a = 3 and b = 40 limit 1
Even if we know the cardinality is overestimated, we have no idea whether the cardinality of a = 3 or b = 40 is wrong or they just correlate, so there is no notion of which is actually the cheapest plan. Usual workaround for most of these queries is to add an order by (which has the nice addition of having a deterministic result) with an appropriate complex index, usually resulting in indexscans.

While we actually know more after the first execution of a nodes like materialize, sort or hash nodes, I rarely encounter materialize nodes in the wild. Consequently that is the place where the work is usually already done, which is especially true with the hash node. Even though it still might be more optimal to switch from a mergejoin to a hashjoin in some cases, I doubt that's worth any work (and even less the maintenance).

Best regards
Arne Roland

-----Original Message-----
From: pgsql-performance-owner@xxxxxxxxxxxxxx [mailto:pgsql-performance-owner@xxxxxxxxxxxxxx] On Behalf Of Oliver Mattos
Sent: Monday, November 13, 2017 5:45 PM
To: pgsql-performance@xxxxxxxxxxxxxx
Subject:  Query planner gaining the ability to replanning after start of query execution.

I am interested in giving the query planner the ability to replan (or re-rank plans) after query execution has begun, based on the progression of the query so far.

Example use case:

*  A LIMIT 1 query is planned using an expensive scan which the planner expects to return a large number of results, and to terminate
early.   The reality is the query actually produces no results, and
the scan must run to completion, potentially taking thousands of times longer than expected.

*  If this plans costs were adjusted mid-execution to reflect the fact that the scan is producing far fewer rows than expected, then another query plan might come out ahead, which would complete far faster.


Has this been done before?   Are there any pitfalls to beware of?


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




-----Original Message-----
From: pgsql-performance-owner@xxxxxxxxxxxxxx [mailto:pgsql-performance-owner@xxxxxxxxxxxxxx] On Behalf Of Oliver Mattos
Sent: Monday, November 13, 2017 5:45 PM
To: pgsql-performance@xxxxxxxxxxxxxx
Subject:  Query planner gaining the ability to replanning after start of query execution.

I am interested in giving the query planner the ability to replan (or re-rank plans) after query execution has begun, based on the progression of the query so far.

Example use case:

*  A LIMIT 1 query is planned using an expensive scan which the planner expects to return a large number of results, and to terminate
early.   The reality is the query actually produces no results, and
the scan must run to completion, potentially taking thousands of times longer than expected.

*  If this plans costs were adjusted mid-execution to reflect the fact that the scan is producing far fewer rows than expected, then another query plan might come out ahead, which would complete far faster.


Has this been done before?   Are there any pitfalls to beware of?


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux