Search Postgresql Archives

Re: Fuzzy string matching of product names

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



George Silva wrote:
> The above is true. For geocoding the same idea is used: the metaphone
> function is used against street names, and searched to a simples column,
> filled with the results of the metaphone function. It works quite well.

I would think an expression index would be better than a separate
column.

---------------------------------------------------------------------------


> 
> George
> 
> On Mon, Apr 5, 2010 at 4:23 PM, Brian Modra <brian@xxxxxxxxxxxxx> wrote:
> 
> > On 05/04/2010, Peter Geoghegan <peter.geoghegan86@xxxxxxxxx> wrote:
> > > Hello,
> > >
> > > At the moment, users of my application, which runs on 8.4.3, may
> > > search for products in a way that is implemented roughly like this:
> > >
> > > SELECT * FROM products WHERE description ILIKE '%%usr_string%%';
> > >
> > > This works reasonably well. However, I thought it would be a nice
> > > touch to give my users leeway to spell product names incorrectly when
> > > searching, or to not have to remember if a product is entered as "coca
> > > cola", "CocaCola" or "Coca-cola". At the moment, they don't have to
> > > worry about case sensitivity because I use ILIKE - I'd like to
> > > preserve that. I'd also like to not have it weigh against them heavily
> > > when they don't search for a specific product, but just a common
> > > substring. For example, if they search for "coca-cola", there may be a
> > > number of different coca-cola products: "CocaCola 330ml can",
> > > "Coca-Cola 2 litre bottle", but no actual plain "cocacola". That ought
> > > to not matter too much - all cocacola products should be returned.
> > >
> > > This isn't important enough for me to be willing to add a big
> > > dependency to my application. I'd really prefer to limit myself to the
> > > contrib modules. pg_trgm and fuzzystrmatch look very promising, but
> > > it's not obvious how I can use either to achieve what I want.
> > > Postgres's built-in regex support may have a role to play too.
> > >
> > > I can live with it not being indexable, because typically there are
> > > only tens of thousands of products in a production system.
> > >
> > > Could someone suggest an approach that is reasonably simple and
> > > reasonably generic ?
> >
> > What I do is to create another column that has a simplified version of
> > the string in it.
> > (I created a function to simplify strings, and when the source column
> > is changed or inserted, I also update the "simplified" column.
> > Then when searching, I use the same function to "simplify" the search
> > string and use "=" to test against the "simplified" column.
> >
> > E.g.
> > if the table has a column called "name" that you want to search, you
> > create a name_simplified column, and fill it as so:
> > update your_table set name_simplified=yourSimplifyFunction(name);
> >
> > Then to search:
> > select * from your_table where simplified_name =
> > yourSimplifyFunction('Coca-Cola');
> >
> > This is really fast, because the match is using the index rather than
> > a sequential scan.
> >
> > >
> > > Thanks,
> > > Peter Geoghegan
> > >
> > > --
> > > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> > > To make changes to your subscription:
> > > http://www.postgresql.org/mailpref/pgsql-general
> > >
> >
> >
> > --
> > Brian Modra   Land line: +27 23 5411 462
> > Mobile: +27 79 69 77 082
> > 5 Jan Louw Str, Prince Albert, 6930
> > Postal: P.O. Box 2, Prince Albert 6930
> > South Africa
> > http://www.zwartberg.com/
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
> >
> 
> 
> 
> -- 
> George R. C. Silva
> 
> Desenvolvimento em GIS
> http://blog.geoprocessamento.net

-- 
  Bruce Momjian  <bruce@xxxxxxxxxx>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux