RE: [EXTERNAL] Re: performance expectations for table(s) with 2B recs

"Godfrin, Philippe E" <Philippe.Godfrin@xxxxxxx> · Mon, 13 Dec 2021 16:34:39 +0000

> 
> 
>From: Peter J. Holzer hjp-pgsql@xxxxxx

>Sent: Friday, December 10, 2021 3:43 PM
>To: pgsql-general@xxxxxxxxxxxxxxxxxxxx
>Subject: Re: [EXTERNAL] Re: performance expectations for table(s) with 2B recs
> 
>On 2021-12-10 18:04:07 +0000, Godfrin, Philippe E wrote:
>> >But in my experience the biggest problem with large tables are unstable
>> >execution plans - for most of the parameters the optimizer will choose
>> >to use an index, but for some it will erroneously think that a full
>> >table scan is faster. That can lead to a situation where a query
>> >normally takes less than a second, but sometimes (seemingly at random)
>> >it takes several minutes
>[...]
>> For Peter I have a question. What exactly causes ‘unstable execution plans’ ??
>> 
>> Besides not using bind variables, bad statistics, would you elaborate
>> in what would contribute to that instability?
> 
>Not using bind variables and bad statistics are certainly big factors:
> 
>On one hand not using bind variables gives a lot more information to the
>optimizer, so it can choose a better plan at run time. On the other hand
>that makes hard to predict what plan it will choose.
> 
>Bad statistics come in many flavours: They might just be wrong, that's
>usually easy to fix. More problematic are statistics which just don't
>describe reality very well - they may not show a correlation, causing
>the optimizer to assume that two distributions are independent when they
>really aren't (since PostgreSQL 10 you can create statistics on multiple
>columns which helps in many but not all cases) or not show some other
>peculiarity of the data. Or they may be just so close to a flipping
>point that a small change causes the optimizer to choose a wildly
>different plan.
> 
>Another source is dynamically generated SQL. Your application may just
>put together SQL from fragments or it might use something like
>SQLalchemy or an ORM. In any of these cases what looks like one query
>from a user's perspective may really be a whole family of related
>queries - and PostgreSQL will try to find the optimal plan for each of
>them. Which is generally a good thing, but it adds opportunities to mess
>up.
> 
>hp
> 
>-- 
>_ | Peter J. Holzer | Story must make more sense than reality.
>|_|_) | |
>| | | hjp@xxxxxx | -- Charles Stross, "Creative writing
>__/ | http://www.hjp.at/ | challenge!"

Good answer Peter, I agree wholeheartedly. I was curious if there was something specific to Postgresql .
phil