On 10/13/07, js <ebgssth@xxxxxxxxx> wrote: > > On 10/14/07, Nathan Nobbe <quickshiftin@xxxxxxxxx> wrote: > > how big is your dataset; have you tested against a potential data set > and > > gotten > > long execution times? > > The dataset consists of about several million lines > and I've tested script like above against the dataset. > it took almost a hour. > (perl script that uses set_range finished the job within 2 minutes) > > > notice how i used strpos rather than strstr, because i got this tip from > the > > docs > > Note: If you only want to determine if a particular needle occurs within > > haystack, > > use the faster and less memory intensive function strpos() instead. > > On 10/14/07, Robert Cummings <robert@xxxxxxxxxxxxx> wrote: > > So don't use strstr() use strpos(). Specifically use it like follows: > > > > if( strpos( $haystack, $prefix ) === 0 ) > > { > > // it's a prefix. > > } > > > > Great tip. Thank you! > > > there are some optimization points as well. if you use a for loop, i > think > > it > > will run a hair faster than the foreach, just make sure to store the > length > > of the > > dictionary in an variable and use that as the sentinel control variable. > > also, if > > your search can be case insensitive you can use stripos instead of > strpos, > > that will > > probly get you a little speed bump as well. > > I'll try. > > > if the string methods in a loop are too doggy due to the size of your > > datasets, > > you might write a program that uses berkdb and call it using the shell > from > > within php. > > that seems a shellscript... > yes, you could write a program in c and call it via the shell from within php. say you had a program findPrefixes, and it takes as an argument a file containing the dictionary to search and it spits out a filename of the matches (which it creates). there would be several optionsto call it from within php. i would use the backticks; something like this: $matchFile = `findPrefixes $dictionaryFile`; $matches = file($matchFlie); if you write it in c the chances of beating out the perl script are good. also, if youre considering a database solution (rdbms), i recommend sqlite3. its not nearly as heavy as a server based solution. -nathan