Search Postgresql Archives

Re: Accounting for between table correlation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Unfortunately I'm not free to share the specific schema or the query plans. They derive from an upstream vendor that is 'protective' of their data model. To get to a proper example I'll need to recreate the behavior with generic data in a generified schema.

I apologize for being frustratingly vague. I do feel like an idiot for not at least saying this was with version PG 11.10.
It has been beneficial to me that the replies so far appear to validate my understanding that no version of postgres has cross table correlation statistics in the planner.

Analyze is done frequently, and I think at sufficient sample size. The context is in a large data analysis setting and the data is changed via bulk ETL not OLTP. The effect on analyzing is just instability - the plan can flip in either direction (without underlying data changing) between giving an answer in minutes and timing out after 10s of hours. That could be indicative of too small a sample but I think in this case it is more the statistics aren't useful because it is cross table correlation that can't be accounted for.

The 'fast plans' use parallel seq scans. The 'slow plans' is using index scans. It appears a good query plan correctly predicts it should be bulk processing the tables but bad ones get fooled into trashing (hard disk, not SSD) by mispredicting too few rows to join between the tables. 

If anyone has similar experiences and is generous enough with their time to share possible solutions/work arounds then I'm most grateful. If my description is too vague to be worthwhile answering then I quite understand and apologize for the time wasted in reading.

Thank you.





On Fri, Jan 15, 2021 at 10:26 AM David G. Johnston <david.g.johnston@xxxxxxxxx> wrote:
On Fri, Jan 15, 2021 at 9:10 AM Thomas Kellerer <shammat@xxxxxxx> wrote:
Atul Kumar schrieb am 15.01.2021 um 16:29:
> As per Ron, you are not supposed to ask your questions here.
>
> As According to him, we should keep on doing research on internet
> rather than asking for support directly even you have done enough
> research and until unless “Ron” won’t be satisfied you have to do
> keep on researching.

Ron's question was perfectly valid.

Missing and wrong statistics are one reason for the planner to choose a bad execution plan.

Yeah, at first blush I didn't think analyze really mattered (and it mostly doesn't because while you can keep the statistics up-to-date the multi-table nature of the problem means they are only marginally helpful here), but that just points out the under-specified nature of the original posting.  Taken as a simple question of "is there a way to work around the lack of multi-table statistics" the analyze, and even the specific queries, don't matter all that much.  But it also would be much more useful if the OP would choose a single problematic query and show the schema, query, and explain results, hopefully both good and bad, and comment on how analyze seems to affect the plan choice.  But for the general question about overcoming our statistics limitations the analyze point is not relevant.

David J.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux