Tim <elatllat@xxxxxxxxx> wrote: > So I ran this test: > unzip -p text.docx word/document.xml | perl -p -e > 's/<.+?>/\n/g;s/[^a-z0-9\n]/\n/ig;'|grep ".." > text.txt > ls -hal ./text.* > #-rwxrwxrwx 1 postgres postgres 15M 2011-06-14 15:12 ./text.docx > #-rwxrwxrwx 1 postgres postgres 29M 2011-06-14 15:17 ./text.txt > mv /tmp/text.* /var/lib/postgresql/9.0/main/ > cd ~/;psql -d postgres > #psql (9.0.4) > CREATE DATABASE test; > \q > cd ~/;psql -d test > CREATE TABLE test(title VARCHAR(256), data OID, words TSVECTOR); > INSERT INTO test VALUES ( 'text.docx', LO_IMPORT('text.docx'), > TO_TSVECTOR(pg_read_file('text.txt' ,0, 100000000)) ); > > and I got this: > #ERROR: string is too long for tsvector (30990860 bytes, max > 1048575 bytes) Your test (whatever data it is that you used) don't seem typical of English text. The entire PostgreSQL documentation in HTML form, when all the html files are concatenated is 11424165 bytes (11MB), and the tsvector of that is 364410 (356KB). I don't suppose you know of some publicly available file on the web that I could use to reproduce your problem? > The year is 2011 I don't think searching a 2MB text file is to > much to expect. Based on the ratio for the PostgreSQL docs, it seems possible to index documents considerably larger than that. Without the markup (as in the case of a PDF), I bet it would take a lot less than what I saw for the docs. A printed or typewritten page usually has about 2KB of text per page. I used pdftotext to get as text the contents of a 119 page technical book about database technology, and it came to 235KB of text. I made a tsvector for that, and it was 99KB. So, at *that* rate you'd need about 100 books that size, totaling 11,900 pages of text in a document to hit the limit you showed. Well, probably more than that, because some of the words might be repeated from one book to another. So, I'm back to wondering what problem you're trying to solve where this is actually a limitation for you. -Kevin -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin