Search Postgresql Archives

Re: Initial ugly reverse-translator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Lane wrote:

I don't really see the problem.  I assume from your reference to pg_trgm
that you're using trigram similarity as the prefilter for potential
matches

It turns out that's no good anyway, as it appears to ignore characters outside the ASCII range. Rather less than useful for searching a database of translated strings ;-)

so a slow final LIKE match shouldn't be an issue really.
(And besides, speed doesn't seem like the be-all and end-all here.)

True. It's not so much the speed as the fragility when faced with small changes to formatting. In addition to whitespace, some clients mangle punctuation with features like automatic "curly"-quoting.

AFAICS you just need to translate %-string format escapes to %, quote
any other % or _, and away you go.

One thing that might be worth doing is avoiding spacing sensitivity,
since whitespace is frequently mangled in copy-and-paste.  Perhaps
strip all spaces from both strings before matching?

Yep, that sounds pretty reasonable. As usual I'm making things more complicated than they need to be. I suspect it'll be necessary to strip quotes and some other punctuation too, but that's not a big deal.

--
Craig Ringer


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux