Added to TODO list: * Log queries where the optimizer row estimates were dramatically different from the number of rows actually found (?) --------------------------------------------------------------------------- Doug Fields wrote: > > >Tom Lane wrote: > > > Bruce Momjian <pgman@xxxxxxxxxxxxxxxx> writes: > > > > Martijn van Oosterhout wrote: > > > >> Firstly, I was only thinking of going for the basic nodes (Index > > Scan, Seq > > > >> Scan, Distinct). Other types have far more variables. Secondly, even > > if you > > > >> only count, it's useful. For example, if it tells you that the > > planner is > > > >> off by a factor of 10 more than 75% of the time, that's useful > > information > > > >> independant of what the actual variables are. > > > > > > And reduce the usefulness even more ;-). As a pure stats-gathering > > > exercise it might be worth doing, but not if you only log the failure > > > cases. How will you know how well you are doing if you take a > > > biased-by-design sample? > > Personally, given that it seems like at least once or twice a day someone > asks about performance or "why isn't my index being used" and other stuff - > I think doing this would be a great idea. > > Perhaps not necessarily in the full-fledged way, but creating a sort of > "ANALYZE log," wherein it logs the optimizer's estimate of a query and the > actual results of a query, for every query. This, of course, could be > enableable/disableable on a per-connection basis, per-table basis (like > OIDs), or whatever other basis makes life easiest to the developers. > > Then, when the next ANALYZE is run, it could do it's usual analysis, and > apply some additional heuristics based upon what it learns from the > "ANALYZE log," possibly to do several things: > > 1) Automatically increase/decrease the SET STATISTICS information included > in the analyze, for example, increasing it as a table grows larger and the > "randomness" grows less than linearly with size (e.g., if you have 50 or 60 > groups in a 1,000,000 row table, that certainly needs a higher SET > STATISTICS and I do it on my tables). > 2) Have an additional value on the statistics table called the > "index_heuristic" or "random_page_adjustment_heuristic" which when 1 does > nothing, but otherwise modifies the cost of using an index/seq scan by that > factor - and don't ever change this more than a few percent each ANALYZE > 3) Flags in a second log (maybe the regular log) really bad query estimates > - let it do an analysis of the queries and flag anything two or three std > deviations outside. > > Now, I suggest all this stuff in the name of usability and > self-maintainability. Unfortunately, I don't have the wherewithal to > actually assist in development. > > Another possibility is to put "use_seq_scan" default to OFF, or whatever > the parameter is (I did my optimizing a while ago so it's fading), so that > if there's an index, it will use it, regardless - as this seems to be what > the great majority of people expect to happen. And/or add this to a FAQ, > and let us all reply "see http://.../indexfaq.html." :) > > Cheers, > > Doug > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@xxxxxxxxxxxxxxxx | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend