Re: tsvector limitations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Kevin Grittner wrote:



> Tim <elatllat@xxxxxxxxx> wrote:
> 
<...>
> Your test (whatever data it is that you used) don't seem typical of
> English text.  The entire PostgreSQL documentation in HTML form,
> when all the html files are concatenated is 11424165 bytes (11MB),
> and the tsvector of that is 364410 (356KB).  I don't suppose you
> know of some publicly available file on the web that I could use to
> reproduce your problem?

Try trolling texts at the Internet Archive (archive.org) -- lots of stuff that 
has been rendered into ASCII ... Government documents and the like from all 
periods; novels and the like that are no longer under copyright, so lots of long 
classics.

<http://www.archive.org/stream/ataleoftwocities00098gut/old/2city12p_djvu.txt> 
for example ... 765K

HTH,

Greg Williamson

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux