Re: Re: need some regex help to strip out // comments but not http:// urls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 29, 2013 at 10:51 PM, Sebastian Krebs <krebs.seb@xxxxxxxxx>wrote:

>
>
>
> 2013/5/29 Matijn Woudt <tijnema@xxxxxxxxx>
>
>> On Wed, May 29, 2013 at 6:08 PM, Sean Greenslade <zootboysean@xxxxxxxxx
>> >wrote:
>>
>> > On Wed, May 29, 2013 at 9:57 AM, Jonesy <gmane@xxxxxxxx> wrote:
>> > > On Tue, 28 May 2013 14:17:06 -0700, Daevid Vincent wrote:
>> > >> I'm adding some minification to our cache.class.php and am running
>> into
>> > an
>> > >> edge case that is causing me grief.
>> > >>
>> > >> I want to remove all comments of the // variety, HOWEVER I don't
>> want to
>> > >> remove URLs...
>> > >
>> > > KISS.
>> > >
>> > > To make it simple, straight-forward, and understandable next year
>> when I
>> > > have to re-read what I've written:
>> > >
>> > > I'd change all "://" to "QqQ"  -- or any unlikely text string.
>> > >
>> > > Then I'd do whatever needs to be done to the "//" occurances.
>> > >
>> > > Finally, I'd change all "QqQ" back to "://".
>> > >
>> > > Jonesy
>> >
>> > Wow. This is just a spectacularly bad suggestion.
>> >
>> > First off, this task is probably a bit beyond the capabilities of a
>> > regex. Yes, you may be able to come up with something that works 99%
>> > of the time, but this is really a job for a parser of some sort. I'm
>> > sorry I don't have any suggestions on exactly where to go with that,
>> > however I'm sure Google can be of assistance. The main problem is that
>> > regex doesn't understand context. It just blindly finds patterns. A
>> > parser understands context, and can figure out which //'s are comments
>> > and which are something else. As a bonus, it can probably understand
>> > other forms of comments like /* */, which regex would completely die
>> > on.
>> >
>> >
>> It is possible to write a whole parser as a single regex, being it
>> terribly
>> long and complex.
>>
>
> No, it isn't.
>


It's better if you throw some smart words on the screen if you want to
convince someone. Just thinking about it, it makes sense as a true regular
expression can only describe a regular language, and I think all the
programming languages are not regular languages.
But, We have PHP PCRE with extensions like Recursive patterns[1] and Back
references[2], which can describe much more than just a regular language.
And I do believe it would be able to handle it.
Too bad it probably takes months to complete a regular expression like this.

- Matijn

[1] http://php.net/manual/en/regexp.reference.recursive.php
[2] http://php.net/manual/en/regexp.reference.back-references.php

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux