Re: Determining the similarity between a user supplied short piece of text (between 5 and 15 characters) and a list of similar length text items.

Andrew Ballard <aballard@xxxxxxxxx> · Mon, 19 Jul 2010 18:06:53 -0400

On Mon, Jul 19, 2010 at 2:46 PM, tedd <tedd.sperling@xxxxxxxxx> wrote:
> At 12:39 PM +0100 7/19/10, Richard Quadling wrote:
>>
>> I'm using MS SQL, not mySQL.
>>
>> Found a extended stored procedure with a UDF.
>>
>> Testing it looks excellent.
>>
>> Searching for a match on 30,000 vehicles next to no additional time -
>> a few seconds in total, compared to the over 3 minutes to search using
>> SQL code.
>
> That seems a bit slow.
>
> For example, currently I'm searching over 4,000 records (which contains
> 4,000 paragraphs taken from the text of the King James version of the Bible)
> for matching words, such as %created% and the times are typically around
> 0.009 seconds.
>
> As such, searching ten times that amount should be in the range of tenths of
> a second and not seconds -- so taking a few seconds to search 30,000 records
> seems excessive to me.
>
> Cheers,
>
> tedd

I would be surprised if a Levenshtein or similar_text comparison in a
database were NOT slower than even a wildcard search because of the
calculations that have to be performed on each row in the column being
compared. That, and the fact that user-defined functions in SQL Server
often have a performance penalty of their own.

Just for kicks, you could try loading the values in that column into
an array in PHP and then time iterating the array to calculate the
Levenshtein distances for each value to see how it compares.

Andrew

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php