Search Postgresql Archives

Re: Full text index not being used

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Based on suggestions from this list, I am trying to create a tsvector column and index that, since it is perhaps the recheck and rebuilding of all the vectors that is slowing things down. I don't understand why a recheck is necessary on a gin index.....

My update statement has been running for 36 hours now and has not finished. The statement is: update source_listings set flv = to_tsvector('english', full_listing); I know that it is still working because it occasionally prints out one of those long word errors.

I have only 1.6M rows and each entry in that column is a standard size web page with just the text, maybe 3-5K.

For sure I don't have meaningful long words. Perhaps that is because it is not handling the HTML well and I should be parsing down the web page first. Hopefully that doesn't mean I need to rebuild this column over the course of 3 days - I didn't expect it to take this long so I thought I'd just try it out.





On Feb 2, 2009, at 1:49 AM, Oleg Bartunov wrote:

Alex,

what text you're indexing ? I don't believe you have meaningful
very long words ( > 2047 characters).

Do you really need multicolumn index ?

I'd recommend to separate problem - create column fts for
tsvector('english',full_listing), create index on it and try full-text
query. The way you're doing imply calling to_tsvector every time you
search, which can be very costly.


Olegk

On Sun, 1 Feb 2009, Alex wrote:

So this seems to be because the result size is too big. I still don't
know why it is looping through every record and printing a warning,
but adding a LIMIT makes the queries complete in a reasonable time
(although not all that fast).

However I need to sort and also have many other facets that may or may
not be included in the query.  Adding a sort makes it load every
record again and take forever.

I tried to create an index including all of the fields I query on to
see if that would work, but I get an error the the index row is too
large:

=> create index master_index on source_listings(geo_lat, geo_lon,
price, bedrooms, region, city, listing_type, to_tsvector('english',
full_listing), post_time);
NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.
NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.
NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.
NOTICE:  word is too long to be indexed
DETAIL:  Words longer than 2047 characters are ignored.
ERROR:  index row requires 13356 bytes, maximum size is 8191

Any ideas about how to resolve this?



	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux