Re: stripping enclosed text from PHP code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Winfried Meining wrote:
I am writing on a script that parses a PHP script and finds all function calls to check, if these functions exist. To do this, I needed a function that would strip out all text, which is enclosed in apostrophes or quotation marks. This is somewhat tricky, as the script needs to be aware of what really is an enclosed text and what is PHP code. Apostrophes in quotation mark enclosed text should be ignored and quotation marks in apostrophe enclosed text should be ignored, as well. Similarly, escaped apostrophes in apostrophe enclosed text and escaped quotation marks in quotation mark enclosed text should be ignored. The following function uses preg_match to do this job.
[···]
I noticed that this function is very slow, in particular because
preg_match("/^(.*)some_string(.*)$/", $text, $matches);

always seems to find the *last* occurrence of some_string and not the *first* (I would need the first). There is certainly a way to write another version where one looks at every single character when going through $text, but this would make the code much more complex.

IIRC regexp search from left to right but match from right to left, hence going trough the string while the first part matches and going backwards every time the next part fails to match, and so on for the whole expression, that's why "(?>...)", "once-only subpatterns", exists for.

Now, you're telling (from left to right) to look for "any sequence of any characters followed by 'some_string' and, once again, followed by any sequence of any characters". As long as "(.*)" matches it will loop the whole string till it fails, once it does it will try to match "some_text", if it doesn't it will try to match "some_string" once again from the current position minus one and so on, until it matches (or "(.*)" fails --at the beginning of the string), after "some_string" matches it will repeat the first step for the second "(.*)" on the pattern, so the process will be quite slow.
--I hope this has had some sense for you (somehow it lost it for me)

Also, by default regexp are "greedy", which means "+" and "*" meta-characters will go on and on. In your case, you most likely will need to "limit" this behaviour by specifying them as "ungreedy" (that is, it will try to match the next part after each "+"/"*" matches), you can do this adding a "?" after these meta-characters (e.g. ".+?-")

I wonder, if there is a faster *and* simple way to do the same thing.

	Mhh...  what about something like
  preg_replace('/(["\']).*?(?<!\\\)\\1/X', '', $code)
? it's not 100% accurate, though, if you find something like '\\' it will fail --I guess you should replace these before running the regexp

After trying a little, I found that this code below seems to be quite acceptable, you may want to try it:
  /**
   * Returns an array with the identified function-calls found
   * (including "function declarations", e.g. "function my_func")
   *
   * @param     string  $code
   * @return    string
   * @since     Mon Apr 10 01:13:28 CDT 2006
   * @author    rsalazar
   */
  function getFunctionCalls( $code ) {
      $result = FALSE;
      // try to strip away literal strings
      $code   = preg_replace('/(["\']).*?(?<!\\\)\\1/X', '', $code);
      // look for "function calls"
      if ( preg_match_all('/(?>((?:(?<=\b)function\s+)?\w+)\s*)\(/Xi',
                          $code, $arr_matches) ) {
          $result = $arr_matches[1];
      }
      return  $result;
  } // getFunctionCalls()

I recommend you:
-> http://php.net/pcre
 > http://php.net/manual/en/reference.pcre.pattern.syntax.php
 > http://php.net/manual/en/reference.pcre.pattern.modifiers.php
--
Atentamente,
J. Rafael Salazar Magaña
Innox - Innovación Inteligente
Tel: +52 (33) 3615 5348 ext. 205 / 01 800 2-SOFTWARE
http://www.innox.com.mx

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux