Re: Slow "not in array" operation

Marco Colli <collimarco91@xxxxxxxxx> · Wed, 13 Nov 2019 12:56:22 +0100

> the answer is that is because it is a GIN index. Make the same index only as btree, and you should get good performance as it can filter the tags within a given project without visiting the table.

Currently I have this GIN index:
    "index_subscriptions_on_project_id_and_tags" gin (project_id, tags) WHERE trashed_at IS NULL

It uses the btree_gin extension and works perfectly for tag search, except for the "NOT" operator. I don't understand why it doesn't use the GIN index also for the "NOT" operator.
The problem is that I cannot create the same index with BTree, because PG doesn't support BTree on array :( 

On Wed, Nov 13, 2019 at 12:30 PM Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Wed, Nov 13, 2019 at 4:20 AM Marco Colli <collimarco91@xxxxxxxxx> wrote:
Replying to the previous questions:
- work_mem = 64MB (there are hundreds of connections)
- the project 123 has more than 7M records, and those that don't have the tag 'en' are 4.8M

What was the plan for the one that took 500ms?

This is the query / plan without the filter on tags:

SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL;

                 QUERY PLAN                                                                                                 
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=291342.67..291342.68 rows=1 width=8) (actual time=354.556..354.556 rows=1 loops=1)
   ->  Gather  (cost=291342.05..291342.66 rows=6 width=8) (actual time=354.495..374.305 rows=7 loops=1)
         Workers Planned: 6
         Workers Launched: 6
         ->  Partial Aggregate  (cost=290342.05..290342.06 rows=1 width=8) (actual time=349.799..349.799 rows=1 loops=7)
               ->  Parallel Index Only Scan using index_subscriptions_on_project_id_and_uid on subscriptions  (cost=0.56..287610.27 rows=1092713 width=0) (actual time=0.083..273.018 rows=1030593 loops=7)
                     Index Cond: (project_id = 123)
                     Heap Fetches: 280849
 Planning Time: 0.753 ms
 Execution Time: 374.483 ms
(10 rows)

My previous comment about the bitmap index scan taking half the time was a slip of the eye, I was comparing *cost* of the bitmap index scan to the *time* of the overall plan.  But then the question is, why isn't it doing an index-only scan on  "index_subscriptions_on_project_id_and_tags"?  And the answer is that is because it is a GIN index.  Make the same index only as btree, and you should get good performance as it can filter the tags within a given project without visiting the table.

Cheers,

Jeff