Hi, I have to provide a summary of how much spaces is used in the large objects table based on a group by condition. I would expect an index only scan on the large object table, but a full seq scan that last for hours is performed. BigSql distribution PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit Win Server 2012 R2, 8GB RAM pg server mem settings: effective_cache_size | 6GB maintenance_work_mem | 819MB random_page_cost | 2 shared_buffers | 2GB work_mem | 32MB Testcase 1: Here is a simplified query, timing and the explain plan: SELECT ima.sit_cod, COUNT(*)*2048*4/3 FROM images ima JOIN pg_largeobject ON (loid=ima.val) GROUP BY ima.sit_cod; Time: 343997.661 ms (about 6 min) ran on a small DB, took 4hrs on a ~1TB table HashAggregate (cost=2452378.86..2452379.01 rows=15 width=14) Group Key: ima.sit_cod -> Hash Join (cost=1460.40..2418245.74 rows=6826625 width=6) Hash Cond: (pg_largeobject.loid = ima.val) ---------> Seq Scan on pg_largeobject (cost=0.00..2322919.25 rows=6826625 width=4) -> Hash (cost=1114.62..1114.62 rows=27662 width=10) -> Seq Scan on images ima (cost=0.00..1114.62 rows=27662 width=10) Testcase 2: A simple count(*) for a specific group (small group) perform an Index Only Scan and last few secs. SELECT COUNT(*) FROM images ima JOIN pg_largeobject ON (loid=ima.val) WHERE sit_cod='W8213'; count ------- 8599 Time: 12.090 ms Aggregate (cost=11930.30..11930.31 rows=1 width=8) -> Nested Loop (cost=2.87..11918.58 rows=4689 width=0) -> Bitmap Heap Scan on images ima (cost=2.43..37.81 rows=19 width=4) Recheck Cond: ((sit_cod)::text = 'W8213'::text) -> Bitmap Index Scan on ima_pk (cost=0.00..2.43 rows=19 width=0) Index Cond: ((sit_cod)::text = 'W8213'::text) ---------> Index Only Scan using pg_largeobject_loid_pn_index on pg_largeobject (cost=0.43..621.22 rows=408 width=4) Index Cond: (loid = ima.val) Testcase 3: However, larger group still perform full seq scan SELECT COUNT(*) FROM images ima JOIN pg_largeobject ON (loid=ima.val) WHERE sit_cod='W8317'; count --------- 2209704 Time: 345638.118 ms (about 6 min) Aggregate (cost=2369363.01..2369363.02 rows=1 width=8) -> Hash Join (cost=1125.63..2365419.35 rows=1577463 width=0) Hash Cond: (pg_largeobject.loid = ima.val) ---------> Seq Scan on pg_largeobject (cost=0.00..2322919.25 rows=6826625 width=4) -> Hash (cost=1045.73..1045.73 rows=6392 width=4) -> Bitmap Heap Scan on images ima (cost=127.83..1045.73 rows=6392 width=4) Recheck Cond: ((sit_cod)::text = 'W8317'::text) -> Bitmap Index Scan on ima_pk (cost=0.00..126.23 rows=6392 width=0) Index Cond: ((sit_cod)::text = 'W8317'::text) Pretty sure that using the index would lead to much better perf. Any idea of what can be done? Jean-Marc Lessard Jean-Marc Lessard |