Alex, can you somehow identify document, which has problem with
long word errors ?
Also, if you have space on disk I'd recommend to try
select *, to_tsvector('english',full_listing) as flv from source_listings;
I don't remember if you said us information about
your setup (pg version, OS, memory, what did you change in postgresql.conf..)
Oleg
On Tue, 3 Feb 2009, Alex Neth wrote:
Based on suggestions from this list, I am trying to create a tsvector column
and index that, since it is perhaps the recheck and rebuilding of all the
vectors that is slowing things down. I don't understand why a recheck is
necessary on a gin index.....
My update statement has been running for 36 hours now and has not finished.
The statement is: update source_listings set flv = to_tsvector('english',
full_listing); I know that it is still working because it occasionally
prints out one of those long word errors.
I have only 1.6M rows and each entry in that column is a standard size web
page with just the text, maybe 3-5K.
For sure I don't have meaningful long words. Perhaps that is because it is
not handling the HTML well and I should be parsing down the web page first.
Hopefully that doesn't mean I need to rebuild this column over the course of
3 days - I didn't expect it to take this long so I thought I'd just try it
out.
On Feb 2, 2009, at 1:49 AM, Oleg Bartunov wrote:
Alex,
what text you're indexing ? I don't believe you have meaningful
very long words ( > 2047 characters).
Do you really need multicolumn index ?
I'd recommend to separate problem - create column fts for
tsvector('english',full_listing), create index on it and try full-text
query. The way you're doing imply calling to_tsvector every time you
search, which can be very costly.
Olegk
On Sun, 1 Feb 2009, Alex wrote:
So this seems to be because the result size is too big. I still don't
know why it is looping through every record and printing a warning,
but adding a LIMIT makes the queries complete in a reasonable time
(although not all that fast).
However I need to sort and also have many other facets that may or may
not be included in the query. Adding a sort makes it load every
record again and take forever.
I tried to create an index including all of the fields I query on to
see if that would work, but I get an error the the index row is too
large:
=> create index master_index on source_listings(geo_lat, geo_lon,
price, bedrooms, region, city, listing_type, to_tsvector('english',
full_listing), post_time);
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.
NOTICE: word is too long to be indexed
DETAIL: Words longer than 2047 characters are ignored.
ERROR: index row requires 13356 bytes, maximum size is 8191
Any ideas about how to resolve this?
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general