RE: [SOLVED] need some regex help to strip out // comments but not http:// urls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Andreas Perstinger [mailto:andipersti@xxxxxxxxx]
> Sent: Tuesday, May 28, 2013 11:10 PM
> To: php-general@xxxxxxxxxxxxx
> Subject: Re:  need some regex help to strip out // comments but not
> http:// urls
> 
> On 28.05.2013 23:17, Daevid Vincent wrote:
> > I want to remove all comments of the // variety, HOWEVER I don't want to
> > remove URLs...
> >
> 
> You need a negative look behind assertion
> ( http://www.php.net/manual/en/regexp.reference.assertions.php ).
> 
> "(?<!http:)//" will match "//" only if it isn't preceded by "http:".
> 
> Bye, Andreas

This worked like a CHAMP Andreas my friend! You are a regex guru!

> -----Original Message-----
> From: Sean Greenslade [mailto:zootboysean@xxxxxxxxx]
> Sent: Wednesday, May 29, 2013 10:28 AM
>
> Also, (I haven't tested it, but) I don't think that example you gave
> would work. Without any sort of quoting around the "http://";
> , I would assume the JS interpreter would take that double slash as a
> comment starter. Do tell me if I'm wrong, though.

You're wrong Sean. :-p

This regex works in all cases listed in my example target string.

\s*(?<!:)//.*?$

Or in my actual compress() method:

$sBlob = preg_replace("@\s*(?<!:)//.*?$@m",'',$sBlob);

Target test case with intentional traps:

// another comment here
<iframe src="http://foo.com";>
function bookmarksite(title,url){
    if (window.sidebar) // firefox
        window.sidebar.addPanel(title, url, "");
    else if(window.opera && window.print){ // opera
        var elem = document.createElement('a');
        elem.setAttribute('href',url);
        elem.setAttribute('title',title);
        elem.setAttribute('rel','sidebar');
        elem.click();
    } 
    else if(document.all)// ie
        window.external.AddFavorite(url, title);
}


And for those interested here is the whole method...

public function compress($sBlob)
{
	//remove C style /* */ blocks as well as PHPDoc /** */ blocks
	$sBlob = preg_replace("@/\*(.*?)\*/@s",'',$sBlob);
	//$sBlob =
preg_replace("/\*[^*]*\*+(?:[^*/][^*]*\*+)*/s",'',$sBlob);
	//$sBlob = preg_replace("/\\*(?:.|[\\n\\r])*?\\*/s",'',$sBlob);

	//remove // or # style comments at the start of a line possibly
redundant with next preg_replace
	$sBlob =
preg_replace("@^\s*((^\s*(#+|//+)\s*.+?$\n)+)@m",'',$sBlob);
	//remove // style comments that might be tagged onto valid code
lines. we don't try for # style as that's risky and not widely used
	// @see http://www.php.net/manual/en/regexp.reference.assertions.php
	$sBlob = preg_replace("@\s*(?<!:)//.*?$@m",'',$sBlob);

	if (in_array($this->_file_name_suffix, array('html','htm')))
	{
		//remove <!-- --> blocks
		$sBlob = preg_replace("/<!--[^\[](.*?)-->/s",'',$sBlob);

		//if Tidy is enabled...
		//if (!extension_loaded('tidy')) dl( ((PHP_SHLIB_SUFFIX ===
'dll') ? 'php_' : '') . 'tidy.' . PHP_SHLIB_SUFFIX);
		if (FALSE && extension_loaded('tidy'))
		{
			//use Tidy to clean up the rest. There may be some
redundancy with the above, but it shouldn't hurt
			//See all parameters available here:
http://tidy.sourceforge.net/docs/quickref.html
			$tconfig = array(
					    'clean' => true,
					    'hide-comments' => true,
						'hide-endtags' => true,
	
'drop-proprietary-attributes' => true,
						'join-classes' => true,
						'join-styles' => true,
						'quote-marks' => false,
						'fix-uri' => false,
						'numeric-entities' => true,
						'preserve-entities' => true,
						'doctype' => 'omit',
						'tab-size' => 1,
						'wrap' => 0,
						'wrap-php' => false,
						'char-encoding' => 'raw',
						'input-encoding' => 'raw',
						'output-encoding' => 'raw',
						'ascii-chars' => true,
						'newline' => 'LF',
						'tidy-mark' => false,
						'quiet' => true,
						'show-errors' =>
($this->_debug ? 6 : 0),
						'show-warnings' =>
$this->_debug,
			);

			if ($this->_log_messages) $tconfig['error-file'] =
DBLOGPATH.'/'.$this->get_file_name().'_tidy.log';

			$tidy = tidy_parse_string($sBlob, $tconfig, 'utf8');
			$tidy->cleanRepair();
			$sBlob = tidy_get_output($tidy);

			/*
			//FIXME: [dv] this is an attempted hack to restore
what Tidy fucks up...
	
//http://lists.w3.org/Archives/Public/html-tidy/2013AprJun/ should be a
message from me on 2013-05-01
			//$sBlob = str_replace(array('&lt;?=', '?&gt;',
'-&gt;'), array('<?=', '?>', '->'), $sBlob);
			//$sBlob = str_replace(array('&lt;', '&gt;'),
array('<', '>'), $sBlob);
			*/
		}
	}

	//condense multiple white spaces with a single white space
	
//http://stackoverflow.com/questions/1981349/regex-to-replace-multiple-space
s-with-a-single-space
	
//http://stackoverflow.com/questions/2326125/remove-multiple-whitespaces-in-
php
	$sBlob = preg_replace('/\s+/', ' ', $sBlob);

	//ini_set('xdebug.var_display_max_data', -1); var_dump($sBlob);
	return $sBlob;
}

I never was able to get Tidy to not dick with my < and > chars
unfortunately, however I think even without it, I get most of what I was
looking to accomplish, so I'm not crying over it.


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php





[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux