Re: Dynamlically updating the estimated cost of a transaction

Greg Smith <greg@xxxxxxxxxxxxxxx> · Tue, 08 Dec 2009 00:43:47 -0500

Hasini Gunasinghe wrote:

I am looking for a way to let the user know what the estimated time 
for the current transaction he requested and while the transaction is 
in progress, how much time is elapsed for the transaction  as a 
fraction of the total estimated time at a particular instance, by 
dynamically estimating the time for the transaction at that instance.
I think this one needs to get added to the FAQ.  To re-use from when I 
answered this last month:  this just isn't exposed in PostgreSQL yet. 
Clients ask for queries to be run, eventually they get rows of results 
back, but there's no notion of how many they're going to get in advance 
or how far along they are in executing the query's execution plan. 
There's a couple of academic projects that have started exposing more of 
the query internals, but I'm not aware of anyone who's even started 
moving in the direction of what you'd need to produce a progress bar or 
estimate a total run-time.  It's a hard problem--you could easily spend 
several years of your life on this alone and still not have even a 
mediocre way to predict how much time is left to execute a generic query.

In practice, people tend to save query log files showing historical 
information about how long queries took to run, and then use that to 
predict future response times.  That's a much easier way to get 
something useful for a lot of applications than expecting you can ever 
estimate just based on an EXPLAIN plan.

I got to know how Prostgre estimates the cost for a particular 
operation. Does estimated cost means the estimated time to evaluate an 
operation in the context of Postgre? And also may I know if there is 
any way to achieve the requirement I mentioned above, with the Postgre 
SQL?
Estimated costs are not ever used to predict an estimated time.  An 
interesting research project would be trying to tie the two together 
more tightly, by collecting a bunch of data measuring real EXPLAIN 
ANALYZE execution times with their respective cost estimates.

Even after you collected it, actually using the data from such research 
is quite tricky.  For example, people have tried to tie some of the 
individual cost components to the real world--for example, measuring the 
true amount of time it takes to do a sequential read vs. a seek and 
adjusting random_page_cost accordingly.  But if you then set 
random_page_cost to its real-world value based on that estimate, you get 
a value well outside what seems to work for people in practice.  This 
suggests the underlying cost estimate doesn't reflect the real-world 
value it intends to that closely.  But improving on that situation 
without going backwards in the quality of the plans the query optimizer 
produces is a tricky problem.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@xxxxxxxxxxxxxxx  www.2ndQuadrant.com

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance