Search Postgresql Archives

RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 

> 

>From: Peter J. Holzer hjp-pgsql@xxxxxx

>Sent: Friday, December 10, 2021 3:43 PM

>To: pgsql-general@xxxxxxxxxxxxxxxxxxxx

>Subject: Re: [EXTERNAL] Re: performance expectations for table(s) with 2B recs

> 

>On 2021-12-10 18:04:07 +0000, Godfrin, Philippe E wrote:

>> >But in my experience the biggest problem with large tables are unstable

>> >execution plans - for most of the parameters the optimizer will choose

>> >to use an index, but for some it will erroneously think that a full

>> >table scan is faster. That can lead to a situation where a query

>> >normally takes less than a second, but sometimes (seemingly at random)

>> >it takes several minutes

>[...]

>> For Peter I have a question. What exactly causes ‘unstable execution plans’ ??

>>

>> Besides not using bind variables, bad statistics, would you elaborate

>> in what would contribute to that instability?

> 

>Not using bind variables and bad statistics are certainly big factors:

> 

>On one hand not using bind variables gives a lot more information to the

>optimizer, so it can choose a better plan at run time. On the other hand

>that makes hard to predict what plan it will choose.

> 

>Bad statistics come in many flavours: They might just be wrong, that's

>usually easy to fix. More problematic are statistics which just don't

>describe reality very well - they may not show a correlation, causing

>the optimizer to assume that two distributions are independent when they

>really aren't (since PostgreSQL 10 you can create statistics on multiple

>columns which helps in many but not all cases) or not show some other

>peculiarity of the data. Or they may be just so close to a flipping

>point that a small change causes the optimizer to choose a wildly

>different plan.

> 

>Another source is dynamically generated SQL. Your application may just

>put together SQL from fragments or it might use something like

>SQLalchemy or an ORM. In any of these cases what looks like one query

>from a user's perspective may really be a whole family of related

>queries - and PostgreSQL will try to find the optimal plan for each of

>them. Which is generally a good thing, but it adds opportunities to mess

>up.

> 

>hp

> 

>--

>_ | Peter J. Holzer | Story must make more sense than reality.

>|_|_) | |

>| | | hjp@xxxxxx | -- Charles Stross, "Creative writing

>__/ | http://www.hjp.at/ | challenge!"

 

Good answer Peter, I agree wholeheartedly. I was curious if there was something specific to Postgresql .

phil


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux