Based on suggestions from this list, I am trying to create a tsvector
column and index that, since it is perhaps the recheck and rebuilding
of all the vectors that is slowing things down. I don't understand
why a recheck is necessary on a gin index.....
My update statement has been running for 36 hours now and has not
finished. The statement is: update source_listings set flv =
to_tsvector('english', full_listing); I know that it is still working
because it occasionally prints out one of those long word errors.
I have only 1.6M rows and each entry in that column is a standard size
web page with just the text, maybe 3-5K.
For sure I don't have meaningful long words. Perhaps that is because
it is not handling the HTML well and I should be parsing down the web
page first. Hopefully that doesn't mean I need to rebuild this column
over the course of 3 days - I didn't expect it to take this long so I
thought I'd just try it out.
On Feb 2, 2009, at 1:49 AM, Oleg Bartunov wrote:
Alex,
what text you're indexing ? I don't believe you have meaningful
very long words ( > 2047 characters).
Do you really need multicolumn index ?
I'd recommend to separate problem - create column fts for
tsvector('english',full_listing), create index on it and try full-text
query. The way you're doing imply calling to_tsvector every time you
search, which can be very costly.
Olegk
On Sun, 1 Feb 2009, Alex wrote:
So this seems to be because the result size is too big. I still
don't
know why it is looping through every record and printing a warning,
but adding a LIMIT makes the queries complete in a reasonable time
(although not all that fast).
However I need to sort and also have many other facets that may or
may
not be included in the query. Adding a sort makes it load every
record again and take forever.
I tried to create an index including all of the fields I query on to
see if that would work, but I get an error the the index row is too
large:
=> create index master_index on source_listings(geo_lat, geo_lon,
price, bedrooms, region, city, listing_type, to_tsvector('english',
full_listing), post_time);
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.
ERROR: index row requires 13356 bytes, maximum size is 8191
Any ideas about how to resolve this?
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general