Re: Re: need some regex help to strip out // comments but not http:// urls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Matijn Woudt <tijnema@xxxxxxxxx> wrote:

>On Wed, May 29, 2013 at 10:51 PM, Sebastian Krebs
><krebs.seb@xxxxxxxxx>wrote:
>
>>
>>
>>
>> 2013/5/29 Matijn Woudt <tijnema@xxxxxxxxx>
>>
>>> On Wed, May 29, 2013 at 6:08 PM, Sean Greenslade
><zootboysean@xxxxxxxxx
>>> >wrote:
>>>
>>> > On Wed, May 29, 2013 at 9:57 AM, Jonesy <gmane@xxxxxxxx> wrote:
>>> > > On Tue, 28 May 2013 14:17:06 -0700, Daevid Vincent wrote:
>>> > >> I'm adding some minification to our cache.class.php and am
>running
>>> into
>>> > an
>>> > >> edge case that is causing me grief.
>>> > >>
>>> > >> I want to remove all comments of the // variety, HOWEVER I
>don't
>>> want to
>>> > >> remove URLs...
>>> > >
>>> > > KISS.
>>> > >
>>> > > To make it simple, straight-forward, and understandable next
>year
>>> when I
>>> > > have to re-read what I've written:
>>> > >
>>> > > I'd change all "://" to "QqQ"  -- or any unlikely text string.
>>> > >
>>> > > Then I'd do whatever needs to be done to the "//" occurances.
>>> > >
>>> > > Finally, I'd change all "QqQ" back to "://".
>>> > >
>>> > > Jonesy
>>> >
>>> > Wow. This is just a spectacularly bad suggestion.
>>> >
>>> > First off, this task is probably a bit beyond the capabilities of
>a
>>> > regex. Yes, you may be able to come up with something that works
>99%
>>> > of the time, but this is really a job for a parser of some sort.
>I'm
>>> > sorry I don't have any suggestions on exactly where to go with
>that,
>>> > however I'm sure Google can be of assistance. The main problem is
>that
>>> > regex doesn't understand context. It just blindly finds patterns.
>A
>>> > parser understands context, and can figure out which //'s are
>comments
>>> > and which are something else. As a bonus, it can probably
>understand
>>> > other forms of comments like /* */, which regex would completely
>die
>>> > on.
>>> >
>>> >
>>> It is possible to write a whole parser as a single regex, being it
>>> terribly
>>> long and complex.
>>>
>>
>> No, it isn't.
>>
>
>
>It's better if you throw some smart words on the screen if you want to
>convince someone. Just thinking about it, it makes sense as a true
>regular
>expression can only describe a regular language, and I think all the
>programming languages are not regular languages.
>But, We have PHP PCRE with extensions like Recursive patterns[1] and
>Back
>references[2], which can describe much more than just a regular
>language.
>And I do believe it would be able to handle it.
>Too bad it probably takes months to complete a regular expression like
>this.
>
>- Matijn
>
>[1] http://php.net/manual/en/regexp.reference.recursive.php
>[2] http://php.net/manual/en/regexp.reference.back-references.php

Sometimes when all you know is regex, everything looks like a nail...

Thanks,
Ash

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php





[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux