Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> writes: > Tom Lane wrote: >> I don't really see the problem. I assume from your reference to pg_trgm >> that you're using trigram similarity as the prefilter for potential >> matches > It turns out that's no good anyway, as it appears to ignore characters > outside the ASCII range. Rather less than useful for searching a > database of translated strings ;-) A quick look at the pg_trgm code suggests that it is only prepared to deal with single-byte encodings; if you're working in UTF8, which I suppose you'd have to be, it's dead in the water :-(. Perhaps fixing that should be on the TODO list. But in any case maybe the full-text-search stuff would be more useful as a prefilter? Although honestly, for the speed we need here, I'm not sure a prefilter is needed at all. Full text might be useful if a LIKE-based match fails, though. >> (And besides, speed doesn't seem like the be-all and end-all here.) > True. It's not so much the speed as the fragility when faced with small > changes to formatting. In addition to whitespace, some clients mangle > punctuation with features like automatic "curly"-quoting. Yeah. I was wondering whether encoding differences wouldn't be a huge problem in practice, as well. regards, tom lane