Re: FTS performance issue - planner problem identified (but only partially resolved)

Marc Mamin <M.Mamin@xxxxxxxxxxxx> · Sat, 20 Jul 2013 08:46:21 +0000

________________________________________
Von: Stefan Keller [sfkeller@xxxxxxxxx]
>Gesendet: Samstag, 20. Juli 2013 01:55
>
>Hi Marc
>
>Thanks a lot for your hint!
>
>You mean doing a "SET track_counts (true);" for the whole session?

No, 
I mean 

ALTER TABLE <table> ALTER  <ts_vector_column> SET STATISTICS 0;

And remove existing statistics

DELETE FROM pg_catalog.pg_statistic 
where starelid='<table>':: regclass
AND staattnum = (SELECT attnum FROM  	pg_attribute
                WHERE attrelid = '<table>':: regclass
                AND  attname  =  '<ts_vector_column>'::name
                )

But you should first try to find out which proportion of your ts queries are faster 
when using a table scan as they will probably not happen anymore afterwards !
(Except if further columns on your table 'FullTextSearch' are considered by the planner)

>That would be ok if it would be possible just for the gin index.
>
>It's obviously an issue of the planner estimation costs.
>The data I'm speaking about ("movies") has a text attribute which has
>a length of more than 8K so it's obviously having to do with
>detoasting.
>But the thoughts about @@ operators together with this GIN index seem
>also to be valid.
>
>I hope this issue is being tracked in preparation for 9.3.
>
>Regards, Stefan
>
>
>2013/7/19 Marc Mamin <M.Mamin@xxxxxxxxxxxx>:
>>
>>> SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
>>> plainto_tsquery('english', 'good');
>>>
>>> It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
>>> The planner obviously always chooses table scan
>>
>>
>> Hello,
>>
>> A probable reason for the time difference is the cost for decompressing toasted content.
>> At least in 8.3, the planner was not good at estimating it.
>>
>> I'm getting better overall performances since I've stopped collect statistic on tsvectors.
>> An alternative would have been to disallow compression on them.
>>
>> I'm aware this is a drastic way and would not recommend it without testing. The benefit may depend on the type of data you are indexing.
>> In our use case these are error logs with many java stack traces, hence with many lexemes poorly discriminative.
>>
>> see: http://www.postgresql.org/message-id/27953.1329434125@xxxxxxxxxxxxx
>> as a comment on
>> http://www.postgresql.org/message-id/C4DAC901169B624F933534A26ED7DF310861B363@xxxxxxxxxxxxxxxxxxxxxxxxxx
>>
>> regards,
>>
>> Marc Mamin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance