Re: [GENERAL] Creation of tsearch2 index is very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 21 Jan 2006, Martijn van Oosterhout wrote:

On Sat, Jan 21, 2006 at 04:29:13PM +0300, Oleg Bartunov wrote:
Martijn, you're right! We want not only to split page to very
different parts, but not to increase the number of sets bits in
resulted signatures, which are union (OR'ed) of all signatures
in part. We need not only fast index creation (thanks, Tom !),
but a better index. Some information is available here
http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
There are should be more detailed document, but I don't remember where:)

I see how it works, what I don't quite get is whether the "inverted
index" you refer to is what we're working with here, or just what's in
tsearchd?

just tsearchd. We plan to implement inverted index into PostgreSQL core
and then adapt tsearch2 to use it as option for read-only archives.


That's harder though (this algorithm does approximate it sort of)
and I havn't come up with an algorithm yet

Don't ask how hard we thought :)

Well, looking at how other people are struggling with it, it's
definitly a Hard Problem. One thing though, I don't think the picksplit
algorithm as is really requires you to strictly have the longest
distance, just something reasonably long. So I think the alternate
algorithm I posted should produce equivalent results. No idea how to
test it though...

you may try our development module 'gevel' to see how dense is a signature.

www=# \d v_pages
          Table "public.v_pages"
  Column   |       Type        | Modifiers
-----------+-------------------+-----------
 tid       | integer           | not null
 path      | character varying | not null
 body      | character varying |
 title     | character varying |
 di        | integer           |
 dlm       | integer           |
 de        | integer           |
 md5       | character(22)     |
 fts_index | tsvector          |
Indexes:
    "v_pages_pkey" PRIMARY KEY, btree (tid)
    "v_pages_path_key" UNIQUE, btree (path)
    "v_gist_key" gist (fts_index)

# select * from gist_print('v_gist_key') as t(level int, valid bool, a gtsvector) where level =1;
 level | valid |               a
-------+-------+--------------------------------
     1 | t     | 1698 true bits, 318 false bits
     1 | t     | 1699 true bits, 317 false bits
     1 | t     | 1701 true bits, 315 false bits
     1 | t     | 1500 true bits, 516 false bits
     1 | t     | 1517 true bits, 499 false bits
(5 rows)



	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux