Re: Comparing tsvector results

"David G. Johnston" <david.g.johnston@xxxxxxxxx> · Thu, 16 Jun 2016 15:16:08 -0400

Please do not top-post.

On Thu, Jun 16, 2016 at 2:38 PM, David G. Johnston <david.g.johnston@xxxxxxxxx> wrote:
On Thu, Jun 16, 2016 at 2:21 PM, Liza Sazonova <liza@xxxxxxxxxxxxx> wrote:
Hello,
I'm trying to implement a keyword-based search on my database. 

I have some text input (Say, "The A or B and C with D") and a list of keywords ("B","X","Y"). I want to identify which keywords are present in the text input. I also want to use a thesaurus to process the text input, since the keywords have synonyms (for example, "The A : X"). 

As far as I understand, the only way for me to process text input with a thesaurus would be to use tsvector / tsquery; or at least this is the fastest way. 
Is it possible to retrieve the tsvector from the text input, then retrieve the tsvector from the keyword list, and identify which words are present in both?

From its description the tsvector_to_array function sounds like it would be helpful here.

https://www.postgresql.org/docs/9.6/static/functions-textsearch.html#TEXTSEARCH-FUNCTIONS-TABLE

tsvector_to_array(input) -> unnest
intersect
tsvector_to_array(keywords) -> unnest

David J.

On Thu, Jun 16, 2016 at 2:51 PM, Liza Sazonova <liza@xxxxxxxxxxxxx> wrote:
Thank you!
It seems like this function's not available in PostgreSQL 9.5 that I'm using 
I'm getting: "ERROR:  function tsvector_to_array(unknown) does not exist".

Do you know of any equivalent function in earlier releases?

I was thinking of casting ts_vector into text, and then splitting the text using a delimiter; but this seems like a slow solution and I was wondering if there's anything better.

Not that I can see nor would expect given someone to the time to write it for 9.6 (sorry, about not pointing that out - blindly suggesting solutions only available in beta is not something I usually do).

Your best bet I'd say is using "strip(tsvector)" to clean the inputs, then use "string_to_array(...)" to get your array and then unnest/intersect

https://www.postgresql.org/docs/current/static/functions-array.html

There may be other solutions available in 3rd-party extensions or that I'm just missing but just using standard 9.5 stuff that's what I see.

I'd solve your problem using a custom function and if you indeed feel it is too slow then explore alternatives.

David J.