Hi Aidan, thanks a lot for this detailed summary! So, why I don't use prepare here: let's say I'm testing the worst stress case :-) Imagine you have thousands of such kind of queries - you cannot prepare all of them! :-) or you'll maybe prepare it once, but as I showed previously in this thread prepare statement itself takes 16ms, so for a single shot there is no gain! :-) Stressing with such kind of short and simple queries (and again, they have joins, it may be even more simple :-)) will give me a result to show with guarantee my worst case - I know then if I have to deploy a bombarding OLTP-like application my database engine will be able to keep such workload, and if I have performance problems they are inside of application! :-) (well, it's very simplistic, but it's not far from the truth :-)) Now, as you see from your explanation, the Part #2 is the most dominant - so why instead to blame this query not to implement a QUERY PLANNER CACHE??? - in way if any *similar* query is recognized by parser we simply *reuse* the same plan?.. Rgds, -Dimitri On 5/11/09, Aidan Van Dyk <aidan@xxxxxxxxxxx> wrote: > * Dimitri <dimitrik.fr@xxxxxxxxx> [090511 11:18]: >> Folks, it's completely crazy, but here is what I found: >> >> - if HISTORY table is analyzed with target 1000 my SELECT response >> time is jumping to 3ms, and the max throughput is limited to 6.000 TPS >> (it's what happenned to 8.3.7) >> >> -if HISTORY table is analyzed with target 5 - my SELECT response time >> is decreasing to 1.2ms (!) and then my max TPS level is ~12.000 ! >> and CPU is used up to 95% even by 8.3.7 :-) and 8.4 performed better >> just because I left its analyze target to default 100 value. >> >> Anyone may explain me why analyze target may have so huge negative >> secondary effect?.. > > It's actually pretty straight forward. > > The PostgreSQL query planner is a "smart planner". It takes into > consideration all the statistics available on the columns/tables, > expected outputs based on inputs, etc, to choose what it thinks will be > the best plan. The more data you have in statistics (the larger > statistics target you have), the more CPU time and longer it's going to > take to "plan" your queries. The tradeoff is hopefully better plans. > > But, in your scenario, where you are hitting the database with the > absolute worst possible way to use PostgreSQL, with small, repeated, > simple queries, you're not getting the advantage of "better" plans. In > your case, you're throwing absolutely simple queries at PG as fast as > you can, and for each query, PostgreSQL has to: > > 1) Parse the given "query string" > 2) Given the statistics available, plan the query and pick the best one > 3) Actually run the query. > > Part 2 is going to dominate the CPU time in your tests, more so the more > statistics it has to evaluate, and unless the data has to come from the > disks (i.e. not in shared buffers or cache) is thus going to dominate the > time before you get your results. More statistics means more time > needed to do the planning/picking of the query. > > If you were to use prepared statements, the cost of #1 and #2 is done > once, and then every time you throw a new execution of the query to > PostgreSQL, you get to just do #3, the easy quick part, especially for > small simple queries where all the data is in shared buffers or the cache. > > a. > > -- > Aidan Van Dyk Create like a god, > aidan@xxxxxxxxxxx command like a king, > http://www.highrise.ca/ work like a slave. > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance