I have stopped this query after about 16 hours. At the same time I ran a 'explain analyze' on the same query to find out why it took so long. These two processes generated temporary files of 173GB in /var/lib/postgresql/9.4/main/base/pgsql_tmp.
COPY (SELECT A.ut, B.go AS funding_org, B.gn AS grant_no, C.gt AS thanks, D.au FROM isi.funding_text C, isi.rauthor D, isi.africa_uts A LEFT JOIN isi.funding_org B ON (B.ut = A.ut) WHERE (C.ut IS NOT NULL OR B.ut IS NOT NULL) AND D.rart_id = C.ut AND C.ut = B.ut GROUP BY A.ut, GO, gn, gt, au ORDER BY funding_org) TO '/tmp/africafunding2.csv' WITH csv quote '"' DELIMITER ',';
A modified version of this query finished in 1min 27 sek:
COPY
(SELECT 'UT'||A.ut,
B.go AS funding_org,
B.gn AS grant_no,
C.gt AS thanks
FROM isi.africa_uts A
LEFT JOIN isi.funding_org B ON (B.ut = A.ut)
LEFT JOIN isi.funding_text C ON (A.ut = C.ut)
WHERE (C.ut IS NOT NULL
OR B.ut IS NOT NULL)
GROUP BY A.ut,
GO,
gn,
gt) TO '/tmp/africafunding.csv' WITH csv quote '"' DELIMITER ',';
As I said, the process of 'explain analyze' of the problematic query contributed to the 173GB
temporary files and did not finish in about 16 hours.
Just explain of the query part produces this:Any idea on why adding the rauthor table in the query is so problematic?
"Sort (cost=4781458203.46..4798118612.44 rows=6664163593 width=390)"
" Output: a.ut, b.go, b.gn, c.gt, (array_to_string(array_agg(d.au), ';'::text)), b.go, b.gn, d.au"
" Sort Key: b.go"
" -> GroupAggregate (cost=2293037801.73..2509623118.51 rows=6664163593 width=390)"
" Output: a.ut, b.go, b.gn, c.gt, array_to_string(array_agg(d.au), ';'::text), b.go, b.gn, d.au"
" Group Key: a.ut, b.go, b.gn, c.gt, d.au"
" -> Sort (cost=2293037801.73..2309698210.72 rows=6664163593 width=390)"
" Output: a.ut, c.gt, b.go, b.gn, d.au"
" Sort Key: a.ut, b.go, b.gn, c.gt, d.au"
" -> Merge Join (cost=4384310.92..21202716.78 rows=6664163593 width=390)"
" Output: a.ut, c.gt, b.go, b.gn, d.au"
" Merge Cond: ((c.ut)::text = (d.rart_id)::text)"
" -> Merge Join (cost=635890.84..1675389.41 rows=6069238 width=412)"
" Output: c.gt, c.ut, a.ut, b.go, b.gn, b.ut"
" Merge Cond: ((c.ut)::text = (b.ut)::text)"
" Join Filter: ((c.ut IS NOT NULL) OR (b.ut IS NOT NULL))"
" -> Merge Join (cost=635476.30..675071.77 rows=1150354 width=348)"
" Output: c.gt, c.ut, a.ut"
" Merge Cond: ((a.ut)::text = (c.ut)::text)"
" -> Index Only Scan using africa_ut_idx on isi.africa_uts a (cost=0.42..19130.19 rows=628918 width=16)"
" Output: a.ut"
" -> Sort (cost=632211.00..640735.23 rows=3409691 width=332)"
" Output: c.gt, c.ut"
" Sort Key: c.ut"
" -> Seq Scan on isi.funding_text c (cost=0.00..262238.91 rows=3409691 width=332)"
" Output: c.gt, c.ut"
" -> Index Scan using funding_org_ut_idx on isi.funding_org b (cost=0.56..912582.50 rows=9835492 width=64)"
" Output: b.id, b.ut, b.go, b.gn"
" -> Materialize (cost=0.57..17914892.46 rows=159086560 width=26)"
" Output: d.id, d.rart_id, d.au, d.ro, d.ln, d.af, d.ras, d.ad, d.aa, d.em, d.ag, d.tsv"
" -> Index Scan using rauthor_rart_id_idx on isi.rauthor d (cost=0.57..17517176.06 rows=159086560 width=26)"
" Output: d.id, d.rart_id, d.au, d.ro, d.ln, d.af, d.ras, d.ad, d.aa, d.em, d.ag, d.tsv"
My systerm:
768 GB RAM
shared_ buffers: 32GB
work_mem: 4608MB
Regardsshared_ buffers: 32GB
work_mem: 4608MB
--
Because experiencing your loyal love is better than life itself,
my lips will praise you. (Psalm 63:3)
my lips will praise you. (Psalm 63:3)