Hey everyone, I think it's great that the full text search parser breaks hyphenated words into multiple parts. I think this really could help, but something is not right. rasmas_hackathon=> select * from ts_debug( 'gn-foo' ); alias | description | token | dictionaries | dictionary | lexemes -----------------+---------------------------------+---------+----------------+--------------+---------- asciihword | Hyphenated word, all ASCII | gn-foo | {english_stem} | english_stem | {gn-foo} hword_asciipart | Hyphenated word part, all ASCII | gn | {english_stem} | english_stem | {gn} blank | Space symbols | - | {} | | hword_asciipart | Hyphenated word part, all ASCII | foo | {english_stem} | english_stem | {foo} blank | Space symbols | | {} | | (6 rows) But why does to_tsquery() AND them? rasmas_hackathon=> select * from to_tsquery( 'gn-foo | bandage' ); to_tsquery ------------------------------------ 'gn-foo' & 'gn' & 'foo' | 'bandag' (1 row) Perhaps my vector is like this: rasmas_hackathon=> select to_tsvector( 'gn series bandage' ); to_tsvector ----------------------------- 'bandag':3 'gn':1 'seri':2 (1 row) The rank is so bad. rasmas_hackathon=> select ts_rank_cd( to_tsvector( 'gn series bandage' ), to_tsquery( 'gn-foo | bandage' ) ); ts_rank_cd ------------ 0.1 (1 row) Without the hyphen the rank is better, despite the process above. rasmas_hackathon=> select ts_rank_cd( to_tsvector( 'gn series bandage' ), to_tsquery( 'gn | bandage' ) ); ts_rank_cd ------------ 0.2 (1 row) So wouldn't this be a better query for hyphenated words? 'gn-foo' | 'gn' | 'foo' Aside: Best i can tell the parser is giving instructions to pushval_morph() to treat hyphenated words as "same variants". thanks, Brian -- http://brian.derocher.org http://mappingdc.org http://about.me/brian.derocher -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general