On Wed, Apr 17, 2019 at 11:52:44PM -0400, Gunther wrote:
Hi guys. I don't want to be pushy, but I found it strange that after so much lively back and forth getting to the bottom of this, suddenly my last nights follow-up remained completely without reply. I wonder if it even got received. For those who read their emails with modern readers (I know I too am from a time where I wrote everything in plain text) I marked some important questions in bold.
It was received (and it's visible in the archives). It's right before easter, so I guess some people may be already on a vaction. As for the issue - I think the current hypothesis is that the data distribution is skewed in some strange way, triggering some unexpected behavior in hash join. That seems plausible, but it's really hard to investigate without knowing anything about the data distribution :-( It would be possible to do at least one of these two things: (a) export pg_stats info about distribution of the join keys The number of tables involved in the query is not that high, and this would allo us to generate a data set approximating your data. The one thing this can't do is showing how it's affected by WHERE conditions. (b) export data for join keys This is similar to (a), but it would allow filtering data by the WHERE conditions first. The amount of data would be higher, although we only need data from the columns used as join keys. Of course, if those key values contain sensitive data, it may not be possible, but perhaps you could hash it in some way. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services