Re: Determining the similarity between a user supplied short piece of text (between 5 and 15 characters) and a list of similar length text items.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19 July 2010 19:46, tedd <tedd.sperling@xxxxxxxxx> wrote:
> At 12:39 PM +0100 7/19/10, Richard Quadling wrote:
>>
>> I'm using MS SQL, not mySQL.
>>
>> Found a extended stored procedure with a UDF.
>>
>> Testing it looks excellent.
>>
>> Searching for a match on 30,000 vehicles next to no additional time -
>> a few seconds in total, compared to the over 3 minutes to search using
>> SQL code.
>
> That seems a bit slow.
>
> For example, currently I'm searching over 4,000 records (which contains
> 4,000 paragraphs taken from the text of the King James version of the Bible)
> for matching words, such as %created% and the times are typically around
> 0.009 seconds.
>
> As such, searching ten times that amount should be in the range of tenths of
> a second and not seconds -- so taking a few seconds to search 30,000 records
> seems excessive to me.


Tedd,

I'm not looking for a "word". I'm looking for similar "wrds".

Word is closer to the misspelled wrds that it is to wars.

select dbo.DamerauLevenshteinDistance('words', 'wars'),
dbo.DamerauLevenshteinDistance('words', 'wrds')

(No column name)	(No column name)
2	1

Lower is better.

Also, I have to compare every row in the set and then sort it to find
the lowest values for the Damerau-Levenshtein or the highest for the
Jaro–Winkler distance.

As the value entered is always going to be the unknown, I can't
pre-calculate the distances.

I do an exact match test first.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux