On Wed, Aug 12, 2020 at 8:06 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Ayub M <hiayub@xxxxxxxxx> writes:
> This is PostgreSQL 11 on AWS, there is a mview query in this OLAP database,
> the tables involved are huge - 50-100m records on average records hundreds
> of columns in most cases. The query runs for a while and then errors out
> saying "No space left on device". I could see it generating around 500gb of
> temp file data. At times it goes thru and at times it fails - probably due
> to other queries running at the same time and causing failure.
Are you sure that these queries are actually producing the answers you
want? It sounds suspiciously like you are computing underconstrained
joins.
--> Yes, it is as per the business requirement.
> The joins are happening on around 10 tables and all are joining on the PK
> and FK columns. I see partition pruning happening but the hash joins are
> killing the query.
> Is there any way to avoid hash joins?
TBH, you are asking the wrong question. A merge join would take about as
much temporary space, and a nestloop join over so much data would probably
not finish in an amount of time you're willing to wait. Indexes are NOT
a magic solution here. What you need to be thinking about is how to not
need to process so much data.
If you really need to have this proven to you, you can try "set
enable_hashjoin = off", but I don't think you'll find that better.
--> You are right, neither merge join nor nested loop are resolving the issue.
regards, tom lane
Regards,
Ayub