Re: Fast prefix search?

"Nathan Nobbe" <quickshiftin@xxxxxxxxx> · Sat, 13 Oct 2007 13:27:01 -0400

On 10/13/07, js <ebgssth@xxxxxxxxx> wrote:
>
> On 10/14/07, Nathan Nobbe <quickshiftin@xxxxxxxxx> wrote:
> > can you use the php string manipulation functions ?
>
> I'll probably use strstr() to check whether a string starts with some
> prefix.
> But problem I like to solve is how to effectively pick strings
> starting with a prefix
> from a large dataset, like a dictionary.
>
> If Berkeley DB's set_range were available from PHP,
> I could write something like
>
> $word = dba_set_range($prefix, $dictionary) // get  the  first word
> starting with $prefix
> do {
>   if (!prefix($word) == $prefix) break
>   $found[] = $word
> } while ($word = dba)_nextkey($dictionary))

how big is your dataset; have you tested against a potential data set and
gotten
long execution times?

foreach($dictionary as $curValue) {
    if((strpos($curValue, $prefix) != false) {
       $found[] = $curValue;
    }
}

notice how i used strpos rather than strstr, because i got this tip from the
docs
*Note: * If you only want to determine if a particular *needle* occurs
within *haystack*,
use the faster and less memory intensive function
strpos()<http://www.php.net/manual/en/function.strpos.php>instead.

there are some optimization points as well.  if you use a for loop, i think
it
will run a hair faster than the foreach, just make sure to store the length
of the
dictionary in an variable and use that as the sentinel control variable.
also, if
your search can be case insensitive you can use stripos instead of strpos,
that will
probly get you a little speed bump as well.

if the string methods in a loop are too doggy due to the size of your
datasets,
you might write a program that uses berkdb and call it using the shell from
within php.

-nathan