Re: Queryplan within FTS/GIN index -search.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I wrote:
> Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
 
>> But the particular example shown here doesn't make a very good case
>> for that, because it's hard to tell how much of a penalty would be
>> taken in more realistic examples.
>  
> Fair enough.  We're in the early stages of moving to tsearch2 and I
> haven't run across this yet in practice.  If I do, I'll follow up.
 
We have a staging database which allowed some limited testing quickly.
While it's real production data, we haven't been gathering this type
of data long, so it's got relatively few rows; therefore, it wasn't
feasible to try any tests which would be disk-bound, so I primed the
cache for all of these, and they are all totally served from cache. 
For various reasons which I'll omit unless asked, we do our text
searches through functions which take a "selection string", turn it
into a tsquery with a little extra massaging on our part, run the
query with a minimum ranking to return, and return a set of records
ordered by the ranking in descending sequence.
 
Under these conditions there is a slight performance gain in adding an
additional test which matches 1356 out of 1691 rows.  Not surprisingly
for a fully cached query set, timings were very consistent from run to
run.  While undoubtedly a little unusual in approach, this is
production software run against real-world data.  I confirmed that it
is using the GIN index on the tsvector for these runs.
 
By the way, the tsearch2 features have been received very well so
far.  One of the first reactions from most users is surprise at how
fast it is.  :-)  Anyway, our production results don't confirm the
issue shown with the artificial test data.
 
 
scca=> select count(*) from "DocThumbnail" where "text" is not null;
 count
-------
  1691
(1 row)

Time: 0.619 ms


scca=> select count(*) from (select "DocThumbnail_text_rank"('guardian
ad litem', 0.1)) x;
 count
-------
    41
(1 row)

Time: 19.394 ms


scca=> select count(*) from (select "DocThumbnail_text_rank"('guardian
ad litem attorney', 0.1)) x;
 count
-------
     4
(1 row)

Time: 16.434 ms


scca=> select count(*) from (select
"DocThumbnail_text_rank"('attorney', 0.1)) x;
 count
-------
  1356
(1 row)

Time: 415.056 ms


scca=> select count(*) from (select "DocThumbnail_text_rank"('guardian
ad litem party', 0.1)) x;
 count
-------
     2
(1 row)

Time: 16.290 ms


scca=> select count(*) from (select "DocThumbnail_text_rank"('party',
0.1)) x;
 count
-------
   935
(1 row)

Time: 386.941 ms


-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux