Re: Breaking up search terms into an array intelligently

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brent Baisley wrote:
It sounds like you are trying to build a full text search string, perhaps for searching a MySQL database?

Actually, I was thinking of doing a MySQL non full-text search, hence the need to split the words/phrases up so that they could then be fed into individual "WHERE fieldname LIKE "%word%" SQL fragments, but now I'm thinking maybe it would be a better idea to do a FULLTEXT search (with the indexes set up), espc. as what I need it for is a "Google-like" search of many fields at once within a table, would certainly make for a less complex query...

Below is the function I came up with a while ago. It's worked fine, although it currently does not check for multiple spaces, but that should be easy to change.

Yup, I've written an enhanced_explode() function that does just this (using preg_split)

It uses a space as a delimiter and it checks for quotes for searching on phrases.

I like the way this bit works! :-)

It returns the search string for a MySQL full text boolean mode search. You should be able to easily adapt it to your specific needs.

There's a few things I'm a bit confused about - e.g. if you're doing a FULLTEXT search, why do you need to split things up at all? In FULLTEXT searches the search string is parsed into words anyway. However, I can see that you've seperated all the search values with a "+" by the end, and that you've appended a "*" onto the end of search terms, so I assume what you're actually doing is preparing for a BOOLEAN mode FULLTEXT search? Actually, this must be the case, otherwise the phrase search is meaningless! :-)

A boolean search could actually be quite useful for my needs, my only concerns really are a) Results are not sorted by relevance unlike a normal FULLTEXT search (so what are they ordered by?) and b) if I use the function below, then any user-added operators ("+", "-" etc.) in $seachVal are not really dealt with properly - am I right in assuming either use the function, or allow users to enter their own operators, but not both (apart from quoting phrases)?

BTW sorry for my ignorance here, I''ve only recently begun to look into FULLTEXT searches!

thanks

Paul

    function prepFullTextSearch($searchVal) {
        //Split words into list
$word_List = explode(' ',stripslashes(trim ($searchVal)));
        //Step through word list to get search phrases
        $i                        = 0;
        $isPhrase                = false;
        foreach($word_List as $word) {
$searchItems[$i] = trim(($isPhrase?$searchItems[$i].' '.$word:$word));
            //Check for start of Phrase
            if(substr($searchItems[$i],0,1) == '"') {
                $isPhrase        = true;
            }
            //If not building a phrase, append wildcard (*) to end  of word
            if(!$isPhrase) {
                $searchItems[$i]    .= '*';
                $i++;
            }
            //Check for end of Phrase
            if(substr($searchItems[$i],-1) == '"') {
                $isPhrase        = false;
                $i++;
            }
        }
        $searchVal                = '+'.implode(' +',$searchItems);
        return $searchVal;
    }



On Sep 7, 2005, at 10:54 AM, Paul Groves wrote:

I want to be able to break up a number of search terms typed into an input
box into array, simple enough one would think, just use explode, e.g


$array = explode(" ", $string);


But what if I want to be able to cope with search terms seperated  by > 1
space (a common typing error)? This should work:


function enhanced_explode($string) {
 $array = preg_split ("/\s+/", $string);
 return ($array);
}


But what if I want to allow "Google"-type search parameters, so that
something like the following is split into 3 search terms?:
firstsearchterm "second search term" thirdsearchterm
The following code will do the trick, but is slow and doesn't allow  for
multiple spaces as the delimiter, nor the possibility of multiple delimiters
(e.g. " ", "+", "," etc.)


function explode2($delimeter, $string)
{
   for ($i = 0; $i < strlen($string); $i++)
   {
       if ($string{$i} == '"')
       {
           if ($insidequotes)
               $insidequotes = false;
           else
               $insidequotes = true;
       }
       elseif ($string{$i} == $delimeter)
       {
           if ($insidequotes)
           {
               $currentelement .= $string{$i};
           }
           else
           {
               $returnarray[$elementcount++] = $currentelement;
               $currentelement = '';
           }
       }
       else
       {
           $currentelement .= $string{$i};
       }
   }

   $returnarray[$elementcount++] = $currentelement;

   return $returnarray;
}

None of these solutions are ideal, I guess a clever regular exression
(preg_split) could solve this, but I'm not quite sure how - does anyone have
any ideas? Thanks

Paul

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php






--
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*

  Paul Groves paul.groves@xxxxxxxxxxxxx
  Senior Project Officer, Academic Computing Development Team
  www.oucs.ox.ac.uk/acdt/
  ACDT is part of the Learning Technologies Group
  www.oucs.ox.ac.uk/ltg/

  Oxford University Computing Services | University of Oxford |
  13 Banbury Road | Oxford OX2 6NN | Tel: 01865 273290

 *+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux