Re: Finding matching braces with regular expressions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bruno and Cameron,

On Mon, Mar 19, 2012 at 07:04, Cameron Simpson <cs@xxxxxxxxxx> wrote:
> On 18Mar2012 20:19, Bruno Wolff III <bruno@xxxxxxxx> wrote:
> | On Sun, Mar 18, 2012 at 23:46:17 +0100,
> |    suvayu ali <fatkasuvayu+linux@xxxxxxxxx> wrote:
> | >
> | >I'm trying to write a regular expression that matches function and class
> | >definitions in C/C++ and defuns in lisp code. I intend to use it with
> | >sed and `git blame'. My first attempt relies on indentation. That
> | >obviously breaks rather often.
> |
> | Mathematically, regular expressions can't match braces like this to an
> | unbounded depth. You might be able to use extensions to common regular
> | expression implementations that aren't strictly regular expressions to do this.
>
> In particular, regular expressions are not recursive. Hence not capable
> of arbitrarily matching nested constructs. But you _can_ construct one
> to match a certain depth. For example, four or five deep probably covers
> most things in reasonable code (lisp excluded; that is naturally very
> bracket intensive).
>
> If you're using sed you probably want the "extended regular expressions
> mode", turned on by -E in GNU sed IIRC.
>
> Personally, for easy of debugging, I would construct the regexp from
> smaller pieces in shell. Untested example:
>
>  no_br='[^()]*'                        # no brackets
>  upto1="${no_br}|(\(${no_br}\))"       # no brackets or "no brackets" in brackets
>  upto2="${no_br}|(\(${upto1}\))"
>  upto3="${no_br}|(\(${upto2}\))"
>  upto4="${no_br}|(\(${upto3}\))"
>  sed -E -n "/^${upto4}\$/!p" <blame-data >blame-bad-bracketing
>
> | It might be easier to write your own parser. There are tools, like flex
> | and bison to help with this.
>
> Indeed. Or you could write a hand rolled recursive descent parser in
> what ever language you like provided it makes character by character
> access fairly easy (C, python, etc; not awk or sed).
>

I was expecting this to be the case, after all using regular expressions
is not the same as using a programming language. I'll think how I can
adapt my use case as per your suggestions. Thanks a lot for the
different pointers. :)

Cheers,

-- 
Suvayu

Open source is the future. It sets us free.
-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org


[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux