Search Postgresql Archives

Re: Forcing the right queryplan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for not replying earlier, I've been quite busy.

On 31 Aug 2010, at 16:50, Henk van Lingen wrote:

> syslog=# \d systemevents
>                                         Table "public.systemevents"
>       Column       |            Type             |                         Modi
> fiers                         
> --------------------+-----------------------------+-----------------------------
> ------------------------------
> id                 | integer                     | not null default nextval('sy

(...)

> message            | text                        | 

(...)

> Indexes:
>    "systemevents_pkey" PRIMARY KEY, btree (id)
>    "fromhost_idx" btree (fromhost)
>    "msgs_idx" gin (to_tsvector('english'::regconfig, message))
> 
> The GIN index is to do text searching (via LogAnalyzer).
> 
> Now there are two types of query plans:

Do you have output of explain analyse for these queries as well? It's hard to see what is actually going on with just the explain - we can't see which part of the query is more expensive than the planner expected, for starters.

> syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemevents WHERE (  ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) )  ORDER BY id DESC LIMIT 100;                                                   QUERY PLAN                   
> 
> --------------------------------------------------------------------------------
> ---------------------------------
> Limit  (cost=0.00..10177.22 rows=100 width=159)
>   ->  Index Scan Backward using systemevents_pkey on systemevents  (cost=0.00..
> 1052934.86 rows=10346 width=159)
>         Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.
> 211.112.9'::text))
> (3 rows)
> 
> This one is useless (takes very long). However this one:
> 
> syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM systemevents WHERE (  ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) )  ORDER BY id DESC LIMIT 500;
>                                                    QUERY PLAN                  
> 
> --------------------------------------------------------------------------------
> -----------------------------------
> Limit  (cost=40928.89..40930.14 rows=500 width=159)
>   ->  Sort  (cost=40928.89..40954.76 rows=10346 width=159)
>         Sort Key: id
>         ->  Bitmap Heap Scan on systemevents  (cost=2898.06..40413.36 rows=1034
> 6 width=159)
>               Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t
> squery('131.211.112.9'::text))
>               ->  Bitmap Index Scan on msgs_idx  (cost=0.00..2895.47 rows=10346
> width=0)
>                     Index Cond: (to_tsvector('english'::regconfig, message) @@ 
> to_tsquery('131.211.112.9'::text))
> (7 rows)
> 
> works acceptable.

Odd that more records and a more complicated plan gives faster results... That's why I think we'd really want to see explain analyse output.
I'm guessing that there are a lot of records matching your search string and that you've found the cut-off point where the planner thinks you're throwing away enough rows that it's not very useful to first select all the matching records before sorting the results.

I think it decided to just start searching backwards along the id and returning the rows that match that IP (and are visible to your transaction) would be faster than trying to work with all the rows that match that IP.

This probably means it misjudged the costs of sorting your index backwards, which indicates that your planning statistics are off, or that your cost parameters aren't appropriate for your system.



One thing I do notice is that the first plan uses the index on id instead of the ts_vector one. For queries like those you could try to use a combined index like this:

CREATE INDEX msgs_idx (to_tsvector('english'::regconfig, message), id) ON systemevents USING (gin);

Whether to use gist or gin there I really don't know. I'd probably go for gist, I seem to recall that gin is fairly heavy to use.

> How to use the right plan regardless of the 'LIMIT-size'?


You could try turning off planner-options, but that's probably a fairly bad idea.

Other options are to use a prepared statement or a stored procedure with the IP as a parameter, which force the planner to use a more general plan because it doesn't know which values you're going to search for before it plans the query.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4c8675d010409863511634!



-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux