pg_trgm and slow bitmap index scan plan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I've been trying to apply pg_tgrm for the search-function of my application. The database fits a few times in the available RAM, and is mostly read-only.
Plans, schema and configs in attachment. Postgresql version 9.1.4 on Debian.

When just searching in one table, it behaves perfectly here. When I put constraints on multiple connected tables (performance and performer), it takes some bad decisions. Somehow the planner thinks that an index scan on a trigram index (on a string) is as fast as an index scan on a btree of an int. Because of that, it will combine both index scans into an "AND" bitmap index scan. Since this is done in a nested loop, the performance gets very bad. The trigram index scan should not be repeated as it is relatively slow and always the same query.

When I disable bitmap scans, it will search on both tables and then hash everything together. This avoids launching the same index scan over and over again. This is much faster.

Since my database is mostly in memory, I guess I could safely disable bitmap scan (or at least for some query), since I understand that this kind of scan is often a way to have a better IO performance. There's little IO in my setup.
However, I'd rather get some help in fixing it right!

Thanks,

Mathieu
shs-dev=# explain analyze select e.id, e.title from performance e join performer p on e.performer_id = p.id WHERE e.title_ ~~ '%beatles%' AND p.name_ ~~ '%pepper%';
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=13.43..88.78 rows=1 width=22) (actual time=352.961..352.961 rows=0 loops=1)
   ->  Bitmap Heap Scan on performer p  (cost=4.32..23.09 rows=5 width=4) (actual time=3.276..3.419 rows=25 loops=1)
         Recheck Cond: (name_ ~~ '%pepper%'::text)
         ->  Bitmap Index Scan on performer_name__trgm_idx  (cost=0.00..4.31 rows=5 width=0) (actual time=3.268..3.268 rows=25 loops=1)
               Index Cond: (name_ ~~ '%pepper%'::text)
   ->  Bitmap Heap Scan on performance e  (cost=9.11..13.12 rows=1 width=26) (actual time=13.978..13.978 rows=0 loops=25)
         Recheck Cond: ((performer_id = p.id) AND (title_ ~~ '%beatles%'::text))
         ->  BitmapAnd  (cost=9.11..9.11 rows=1 width=0) (actual time=13.975..13.975 rows=0 loops=25)
               ->  Bitmap Index Scan on performance_performer_idx  (cost=0.00..4.35 rows=10 width=0) (actual time=0.008..0.008 rows=5 loops=25)
                     Index Cond: (performer_id = p.id)
               ->  Bitmap Index Scan on performance_title_tgrm_idx  (cost=0.00..4.51 rows=20 width=0) (actual time=14.545..14.545 rows=22 loops=24)
                     Index Cond: (title_ ~~ '%beatles%'::text)
 Total runtime: 352.996 ms
(13 rows)

shs-dev=# set enable_bitmapscan = off;
SET
shs-dev=# explain analyze select e.id, e.title from performance e join performer p on e.performer_id = p.id WHERE e.title_ ~~ '%beatles%' AND p.name_ ~~ '%pepper%';
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=24.43..109.22 rows=1 width=22) (actual time=18.563..18.563 rows=0 loops=1)
   Hash Cond: (e.performer_id = p.id)
   ->  Index Scan using performance_title_tgrm_idx on performance e  (cost=0.00..84.71 rows=20 width=26) (actual time=1.237..15.098 rows=22 loops=1)
         Index Cond: (title_ ~~ '%beatles%'::text)
   ->  Hash  (cost=24.36..24.36 rows=5 width=4) (actual time=3.455..3.455 rows=25 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Index Scan using performer_name__trgm_idx on performer p  (cost=0.00..24.36 rows=5 width=4) (actual time=0.159..3.447 rows=25 loops=1)
               Index Cond: (name_ ~~ '%pepper%'::text)
 Total runtime: 18.590 ms
(9 rows)




shs-dev=# \d performance
                                       Table "public.performance"
      Column      |           Type           |                        Modifiers                         
------------------+--------------------------+----------------------------------------------------------
 created_by       | integer                  | not null
 creation_date    | timestamp with time zone | not null
 comments         | text                     | 
 owned_by         | integer                  | not null
 id               | integer                  | not null default nextval('performance_id_seq'::regclass)
 object_type      | text                     | not null default 'performance'::text
 active           | boolean                  | not null default true
 editor_note      | text                     | 
 title            | text                     | 
 title_           | text                     | 
 performer_id     | integer                  | 
 first_release_id | integer                  | 
 vperf_id         | integer                  | 
 perf_date        | partial_date             | 
 bonustrack       | boolean                  | not null default false
 type_id          | integer                  | not null
 instrumental     | boolean                  | not null default false
 init_rev_level   | smallint                 | not null default 1
 curr_rev_level   | smallint                 | not null default 1
 revision_date    | timestamp with time zone | 
 revised_by       | integer                  | 
Indexes:
    "performance_pkey" PRIMARY KEY, btree (id)
    "performance_create_idx" btree (creation_date)
    "performance_medium_idx" btree (first_release_id)
    "performance_own_idx" btree (owned_by)
    "performance_performer_idx" btree (performer_id)
    "performance_title_tgrm_idx" gist (title_ extensions.gist_trgm_ops)
Check constraints:
    "active_check" CHECK (active)
    "bonus_medium" CHECK (NOT (bonustrack AND first_release_id IS NULL))
    "bonus_revision" CHECK (NOT (bonustrack AND curr_rev_level >= 1))
    "performance_object_type_check" CHECK (object_type = 'performance'::text)
Foreign-key constraints:
    "performance_first_medium_id_fkey" FOREIGN KEY (first_release_id) REFERENCES release(id)
    "performance_performer_id_fkey" FOREIGN KEY (performer_id) REFERENCES performer(id)
    "performance_revised_by_fkey" FOREIGN KEY (revised_by) REFERENCES dbuser(id)
    "performance_type_id_fkey" FOREIGN KEY (type_id) REFERENCES performance_type(id)
    "performance_vperf_id_fkey" FOREIGN KEY (vperf_id) REFERENCES vperf(id)
Referenced by:
    TABLE "didyouknow" CONSTRAINT "didyouknow_performance_id_fkey" FOREIGN KEY (performance_id) REFERENCES performance(id) ON DELETE CASCADE
    TABLE "part" CONSTRAINT "part_performance_id_fkey" FOREIGN KEY (performance_id) REFERENCES performance(id) ON DELETE CASCADE
    TABLE "perf_rating" CONSTRAINT "perf_rating_song_fkey" FOREIGN KEY (performance_id) REFERENCES performance(id) ON DELETE CASCADE
    TABLE "performance_itunes" CONSTRAINT "performance_itunes_performance_id_fkey" FOREIGN KEY (performance_id) REFERENCES performance(id) ON DELETE CASCADE
    TABLE "performance_youtube" CONSTRAINT "performance_youtube_performance_id_fk



shs-dev=# \d performer
                                     Table "public.performer"
    Column     |           Type           |                       Modifiers                        
---------------+--------------------------+--------------------------------------------------------
 created_by    | integer                  | not null
 creation_date | timestamp with time zone | 
 comments      | text                     | 
 owned_by      | integer                  | not null
 id            | integer                  | not null default nextval('performer_id_seq'::regclass)
 object_type   | text                     | not null default 'performer'::text
 active        | boolean                  | not null default true
 editor_note   | text                     | 
 name          | text                     | not null
 name_         | text                     | not null
Indexes:
    "performer_pkey" PRIMARY KEY, btree (id)
    "performer_name_idx" UNIQUE, btree (name)
    "performer_name__idx" btree (name_ text_pattern_ops)
    "performer_name__trgm_idx" gist (name_ extensions.gist_trgm_ops)
    "performer_own_idx" btree (owned_by)
Check constraints:
    "active_check" CHECK (active)
    "performer_object_type_check" CHECK (object_type = 'performer'::text)
Referenced by:
    TABLE "release" CONSTRAINT "medium_performer_id_fkey" FOREIGN KEY (performer_id) REFERENCES performer(id)
    TABLE "membership" CONSTRAINT "membership_performer_id_fkey" FOREIGN KEY (performer_id) REFERENCES performer(id) ON DELETE CASCADE
    TABLE "performance" CONSTRAINT "performance_performer_id_fkey" FOREIGN KEY (performer_id) REFERENCES performer(id)
Triggers:
    performer_data_trigger BEFORE INSERT OR DELETE OR UPDATE ON performer FOR EACH ROW EXECUTE PROCEDURE performer_data_trigger()
Inherits: primary_object
ey" FOREIGN KEY (performance_id) REFERENCES performance(id) ON UPDATE CASCADE ON DELETE CASCADE
    TABLE "sample" CONSTRAINT "sample_sampled_id_fkey" FOREIGN KEY (sampled_id) REFERENCES performance(id)
    TABLE "sample" CONSTRAINT "sample_sampler_id_fkey" FOREIGN KEY (sampler_id) REFERENCES performance(id)
    TABLE "track" CONSTRAINT "track_performance_id_fkey" FOREIGN KEY (performance_id) REFERENCES performance(id) ON DELETE CASCADE
    TABLE "work" CONSTRAINT "work_first_performance_id_fkey" FOREIGN KEY (first_performance_id) REFERENCES performance(id)
    TABLE "work" CONSTRAINT "work_first_recording_id_fkey" FOREIGN KEY (first_recording_id) REFERENCES performance(id)
    TABLE "work" CONSTRAINT "work_first_release_id_fkey" FOREIGN KEY (first_release_id) REFERENCES performance(id)
Triggers:
    performance_data_trigger BEFORE INSERT OR DELETE OR UPDATE ON performance FOR EACH ROW EXECUTE PROCEDURE performance_data_trigger()
Inherits: primary_object





Attachment: postgresql.conf
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux