Search Postgresql Archives

Re: PostgreSQL GIN index not used when ts_query language is fetched from a column

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 15 Sep 2016, at 14:46, Jaap Roes <jroes@xxxxxxxxxxx> wrote:
> 
> I've got a table that stores some multilingual content:
> 
>    CREATE TABLE search (
>      content text NOT NULL,
>      language regconfig NOT NULL,
>      fulltext tsvector
>    );
>    CREATE INDEX search_fulltext ON search USING GIN(fulltext);
...
> To make sure I always search in the correct language I use these queries:
> 
>    SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
>    (1 row)
> 
>    SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'vlieg');
>    (1 row)
> 
> Because hardcoding the language doesn't give the correct results:
> 
>    SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine');
>    (0 rows)
> 
>    SELECT FROM search WHERE fulltext @@ to_tsquery('english', 'vlieg');
>    (0 rows)
> 
> The problem however is that PostgreSQL doesn't use the GIN index when using the first set of queries and instead does a sequential scan:
...
>    EXPLAIN ANALYZE SELECT * FROM search WHERE fulltext @@ to_tsquery(language, 'shine’);
> 
>    Seq Scan on search  (cost=0.00..17.35 rows=2 width=136) (actual time=0.040..0.044 rows=1 loops=1)
>        Filter: (fulltext @@ to_tsquery(language, 'shine'::text))
>        Rows Removed by Filter: 5
>    Planning time: 0.039 ms
>    Execution time: 0.064 ms
> 
> While it does when hardcoding a language:
> 
>    EXPLAIN ANALYZE SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'vlieg');
>    Bitmap Heap Scan on search  (cost=12.63..23.66 rows=82 width=0) (actual time=0.044..0.044 rows=1 loops=1)
>      Recheck Cond: (fulltext @@ '''vlieg'''::tsquery)
>      Heap Blocks: exact=1
>        ->  Bitmap Index Scan on search_fulltext  (cost=0.00..12.61 rows=82 width=0) (actual time=0.037..0.037 rows=1 loops=1)
>          Index Cond: (fulltext @@ '''vlieg'''::tsquery)
>    Planning time: 0.128 ms
>    Execution time: 0.065 ms
> 
> So my question is: Is it at all possible to use a column in the ts_query to use the correct language config and still have Postgres use the GIN index?

So I came up with a solution. Pre-localise the query in a join and use that to filter the matches:

    SELECT * FROM search s 
    INNER JOIN (
        SELECT 'dutch'::regconfig AS language, to_tsquery('dutch', 'shine') as q
        UNION SELECT 'english'::regconfig AS language, to_tsquery('english', 'shine') as q
        UNION SELECT 'simple'::regconfig AS language, to_tsquery('simple', 'shine') as q
    ) q ON (s.language=q.language) 
    WHERE fulltext @@ q;

This seems to work, but the query plan looks a bit confusing, so I’m not super confident about the correctness:

    Nested Loop  (cost=205.44..1327.12 rows=188 width=1590) (actual time=3.350..7.010 rows=16 loops=1)
      ->  Unique  (cost=0.08..0.11 rows=3 width=0) (actual time=0.010..0.021 rows=3 loops=1)
            ->  Sort  (cost=0.08..0.09 rows=3 width=0) (actual time=0.008..0.011 rows=3 loops=1)
                  Sort Key: ('dutch'::regconfig), ('''vlieg'''::tsquery)
                  Sort Method: quicksort  Memory: 25kB
                  ->  Append  (cost=0.00..0.06 rows=3 width=0) (actual time=0.001..0.002 rows=3 loops=1)
                        ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
                        ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
                        ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
      ->  Bitmap Heap Scan on search s  (cost=205.36..441.70 rows=63 width=1554) (actual time=2.310..2.318 rows=5 loops=3)
            Recheck Cond: ((fulltext @@ ('''vlieg'''::tsquery)) AND ((language)::oid = (('dutch'::regconfig))::oid))
            Heap Blocks: exact=16
            ->  BitmapAnd  (cost=205.36..205.36 rows=63 width=0) (actual time=2.303..2.303 rows=0 loops=3)
                  ->  Bitmap Index Scan on search_fulltext  (cost=0.00..17.41 rows=188 width=0) (actual time=0.018..0.018 rows=16 loops=3)
                        Index Cond: (fulltext @@ ('''vlieg'''::tsquery))
                  ->  Bitmap Index Scan on search_language  (cost=0.00..187.67 rows=12539 width=0) (actual time=2.277..2.277 rows=12539 loops=3)
                        Index Cond: ((language_config)::oid = (('dutch'::regconfig))::oid)
    Planning time: 0.228 ms
    Execution time: 7.058 ms

Is this the way to go? Or is there a better way, I’m eager to find out!



-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux