Re: Fast prefix search?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/13/07, js <ebgssth@xxxxxxxxx> wrote:
>
> On 10/14/07, Nathan Nobbe <quickshiftin@xxxxxxxxx> wrote:
> > how big is your dataset; have you tested against a potential data set
> and
> > gotten
> > long execution times?
>
> The dataset consists of about several million lines
> and I've tested script like above against the dataset.
> it took almost a hour.
> (perl script that uses set_range finished the job within 2 minutes)
>
> > notice how i used strpos rather than strstr, because i got this tip from
> the
> > docs
> > Note: If you only want to determine if a particular needle occurs within
> > haystack,
> > use the faster and less memory intensive function strpos() instead.
>
> On 10/14/07, Robert Cummings <robert@xxxxxxxxxxxxx> wrote:
> > So don't use strstr() use strpos(). Specifically use it like follows:
> >
> >     if( strpos( $haystack, $prefix ) === 0 )
> >     {
> >         // it's a prefix.
> >     }
> >
>
> Great tip. Thank you!
>
> > there are some optimization points as well.  if you use a for loop, i
> think
> > it
> > will run a hair faster than the foreach, just make sure to store the
> length
> > of the
> > dictionary in an variable and use that as the sentinel control variable.
> > also, if
> > your search can be case insensitive you can use stripos instead of
> strpos,
> > that will
> > probly get you a little speed bump as well.
>
> I'll try.
>
> > if the string methods in a loop are too doggy due to the size of your
> > datasets,
> > you might write a program that uses berkdb and call it using the shell
> from
> > within php.
>
> that seems a shellscript...
>

yes, you could write a program in c and call it via the shell from within
php.
say you had a program findPrefixes, and it takes as an argument a file
containing the
dictionary to search and it spits out a filename of the matches (which it
creates).
 there would be several optionsto call it from within php.
i would use the backticks; something like this:

$matchFile = `findPrefixes $dictionaryFile`;
$matches = file($matchFlie);

if you write it in c the chances of beating out the perl script are good.

also, if youre considering a database solution (rdbms), i recommend
sqlite3.  its not
nearly as heavy as a server based solution.

-nathan

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux