RE: Inefficient full seq scan on pg_largeobject instead of index scan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

I have to provide a summary of how much spaces is used in the large objects table based on a group by condition.

I would expect an index only scan on the large object table, but a full seq scan that last for hours is performed.

 

BigSql distribution

PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit

Win Server 2012 R2, 8GB RAM

pg server mem settings:

effective_cache_size | 6GB

maintenance_work_mem | 819MB

random_page_cost     | 2

shared_buffers       | 2GB

work_mem             | 32MB

 

Testcase 1: Here is a simplified query, timing and the explain plan:

SELECT ima.sit_cod, COUNT(*)*2048*4/3

  FROM images ima JOIN pg_largeobject ON (loid=ima.val)

GROUP BY ima.sit_cod;

Time: 343997.661 ms (about 6 min) ran on a small DB, took 4hrs on a ~1TB table

 

HashAggregate  (cost=2452378.86..2452379.01 rows=15 width=14)

  Group Key: ima.sit_cod

  ->  Hash Join  (cost=1460.40..2418245.74 rows=6826625 width=6)

        Hash Cond: (pg_largeobject.loid = ima.val)

--------->  Seq Scan on pg_largeobject  (cost=0.00..2322919.25 rows=6826625 width=4)

        ->  Hash  (cost=1114.62..1114.62 rows=27662 width=10)

              ->  Seq Scan on images ima  (cost=0.00..1114.62 rows=27662 width=10)

 

 

Testcase 2: A simple count(*) for a specific group (small group) perform an Index Only Scan and last few secs.

SELECT COUNT(*)

  FROM images ima JOIN pg_largeobject ON (loid=ima.val)

WHERE sit_cod='W8213';

count

-------

  8599

Time: 12.090 ms

 

Aggregate  (cost=11930.30..11930.31 rows=1 width=8)

  ->  Nested Loop  (cost=2.87..11918.58 rows=4689 width=0)

        ->  Bitmap Heap Scan on images ima  (cost=2.43..37.81 rows=19 width=4)

              Recheck Cond: ((sit_cod)::text = 'W8213'::text)

              ->  Bitmap Index Scan on ima_pk  (cost=0.00..2.43 rows=19 width=0)

                    Index Cond: ((sit_cod)::text = 'W8213'::text)

--------->  Index Only Scan using pg_largeobject_loid_pn_index on pg_largeobject  (cost=0.43..621.22 rows=408 width=4)

              Index Cond: (loid = ima.val)

 

 

Testcase 3: However, larger group still perform full seq scan

SELECT COUNT(*)

  FROM images ima JOIN pg_largeobject ON (loid=ima.val)

WHERE sit_cod='W8317';

  count

---------

2209704

Time: 345638.118 ms (about 6 min)

 

Aggregate  (cost=2369363.01..2369363.02 rows=1 width=8)

  ->  Hash Join  (cost=1125.63..2365419.35 rows=1577463 width=0)

        Hash Cond: (pg_largeobject.loid = ima.val)

--------->  Seq Scan on pg_largeobject  (cost=0.00..2322919.25 rows=6826625 width=4)

        ->  Hash  (cost=1045.73..1045.73 rows=6392 width=4)

              ->  Bitmap Heap Scan on images ima  (cost=127.83..1045.73 rows=6392 width=4)

                    Recheck Cond: ((sit_cod)::text = 'W8317'::text)

                    ->  Bitmap Index Scan on ima_pk  (cost=0.00..126.23 rows=6392 width=0)

                          Index Cond: ((sit_cod)::text = 'W8317'::text)

 

Pretty sure that using the index would lead to much better perf.

Any idea of what can be done?

Jean-Marc Lessard
Administrateur de base de données / Database Administrator

Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux