> -----Original Message----- > From: Andreas Perstinger [mailto:andipersti@xxxxxxxxx] > Sent: Tuesday, May 28, 2013 11:10 PM > To: php-general@xxxxxxxxxxxxx > Subject: Re: need some regex help to strip out // comments but not > http:// urls > > On 28.05.2013 23:17, Daevid Vincent wrote: > > I want to remove all comments of the // variety, HOWEVER I don't want to > > remove URLs... > > > > You need a negative look behind assertion > ( http://www.php.net/manual/en/regexp.reference.assertions.php ). > > "(?<!http:)//" will match "//" only if it isn't preceded by "http:". > > Bye, Andreas This worked like a CHAMP Andreas my friend! You are a regex guru! > -----Original Message----- > From: Sean Greenslade [mailto:zootboysean@xxxxxxxxx] > Sent: Wednesday, May 29, 2013 10:28 AM > > Also, (I haven't tested it, but) I don't think that example you gave > would work. Without any sort of quoting around the "http://" > , I would assume the JS interpreter would take that double slash as a > comment starter. Do tell me if I'm wrong, though. You're wrong Sean. :-p This regex works in all cases listed in my example target string. \s*(?<!:)//.*?$ Or in my actual compress() method: $sBlob = preg_replace("@\s*(?<!:)//.*?$@m",'',$sBlob); Target test case with intentional traps: // another comment here <iframe src="http://foo.com"> function bookmarksite(title,url){ if (window.sidebar) // firefox window.sidebar.addPanel(title, url, ""); else if(window.opera && window.print){ // opera var elem = document.createElement('a'); elem.setAttribute('href',url); elem.setAttribute('title',title); elem.setAttribute('rel','sidebar'); elem.click(); } else if(document.all)// ie window.external.AddFavorite(url, title); } And for those interested here is the whole method... public function compress($sBlob) { //remove C style /* */ blocks as well as PHPDoc /** */ blocks $sBlob = preg_replace("@/\*(.*?)\*/@s",'',$sBlob); //$sBlob = preg_replace("/\*[^*]*\*+(?:[^*/][^*]*\*+)*/s",'',$sBlob); //$sBlob = preg_replace("/\\*(?:.|[\\n\\r])*?\\*/s",'',$sBlob); //remove // or # style comments at the start of a line possibly redundant with next preg_replace $sBlob = preg_replace("@^\s*((^\s*(#+|//+)\s*.+?$\n)+)@m",'',$sBlob); //remove // style comments that might be tagged onto valid code lines. we don't try for # style as that's risky and not widely used // @see http://www.php.net/manual/en/regexp.reference.assertions.php $sBlob = preg_replace("@\s*(?<!:)//.*?$@m",'',$sBlob); if (in_array($this->_file_name_suffix, array('html','htm'))) { //remove <!-- --> blocks $sBlob = preg_replace("/<!--[^\[](.*?)-->/s",'',$sBlob); //if Tidy is enabled... //if (!extension_loaded('tidy')) dl( ((PHP_SHLIB_SUFFIX === 'dll') ? 'php_' : '') . 'tidy.' . PHP_SHLIB_SUFFIX); if (FALSE && extension_loaded('tidy')) { //use Tidy to clean up the rest. There may be some redundancy with the above, but it shouldn't hurt //See all parameters available here: http://tidy.sourceforge.net/docs/quickref.html $tconfig = array( 'clean' => true, 'hide-comments' => true, 'hide-endtags' => true, 'drop-proprietary-attributes' => true, 'join-classes' => true, 'join-styles' => true, 'quote-marks' => false, 'fix-uri' => false, 'numeric-entities' => true, 'preserve-entities' => true, 'doctype' => 'omit', 'tab-size' => 1, 'wrap' => 0, 'wrap-php' => false, 'char-encoding' => 'raw', 'input-encoding' => 'raw', 'output-encoding' => 'raw', 'ascii-chars' => true, 'newline' => 'LF', 'tidy-mark' => false, 'quiet' => true, 'show-errors' => ($this->_debug ? 6 : 0), 'show-warnings' => $this->_debug, ); if ($this->_log_messages) $tconfig['error-file'] = DBLOGPATH.'/'.$this->get_file_name().'_tidy.log'; $tidy = tidy_parse_string($sBlob, $tconfig, 'utf8'); $tidy->cleanRepair(); $sBlob = tidy_get_output($tidy); /* //FIXME: [dv] this is an attempted hack to restore what Tidy fucks up... //http://lists.w3.org/Archives/Public/html-tidy/2013AprJun/ should be a message from me on 2013-05-01 //$sBlob = str_replace(array('<?=', '?>', '->'), array('<?=', '?>', '->'), $sBlob); //$sBlob = str_replace(array('<', '>'), array('<', '>'), $sBlob); */ } } //condense multiple white spaces with a single white space //http://stackoverflow.com/questions/1981349/regex-to-replace-multiple-space s-with-a-single-space //http://stackoverflow.com/questions/2326125/remove-multiple-whitespaces-in- php $sBlob = preg_replace('/\s+/', ' ', $sBlob); //ini_set('xdebug.var_display_max_data', -1); var_dump($sBlob); return $sBlob; } I never was able to get Tidy to not dick with my < and > chars unfortunately, however I think even without it, I get most of what I was looking to accomplish, so I'm not crying over it. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php