Re: [RFC PATCH 3/3] git-grep: Learn PCRE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



MichaÅ Kiedrowicz <michal.kiedrowicz@xxxxxxxxx> writes:

> This patch lacks tests.
>
> AFAIK grep.c is a part of libgit.a, so libpcre is linked with the git
> itself, which bloats it. I don't like it, especially because it makes
> 'make test' take:
>
> 	real    6m27.558s
> 	user    1m34.392s
> 	sys     2m11.029s
>
> instead of
>
> 	real    3m41.322s
> 	user    1m44.005s
> 	sys     2m32.403s
>
> on my PC.

I am not against pcre, but the performance numbers look rather yucky.

> diff --git a/grep.c b/grep.c
> index d67baf9..4f5fcbb 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -70,6 +70,30 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
>  	if (p->fixed)
>  		return;
>  
> +	if (opt->pcre) {
> +#ifdef NO_LIBPCRE
> +		die("cannot use Perl-compatible regexes when libpcre is not compiled in");
> +#else
> +		const char *error;
> +	...
> +		return;
> +#endif
> +	}
> +

Please avoid these #ifdefs in the middle of otherwise generic function.

Make the above into something like:

	if (opt->pcre) {
        	compile_pcre(p, opt);
                return;
	}

and then have an extra section entirely devoted to pcre integration that
has bunch of ...

        #ifdef NO_LIBPCRE
        static void compile_pcre_part(struct grep_pat *p, struct grep_opt *opt)
        {
                die("...");
	}
        #else
        static void compile_pcre_part(struct grep_pat *p, struct grep_opt *opt)
        {
                ...
	}
        #endif

... that is totally separate from the generic part of the codebase.  They
could even be in a separate file, if you need numerous helpers.

> @@ -320,7 +344,16 @@ void free_grep_patterns(struct grep_opt *opt)
>  		case GREP_PATTERN: /* atom */
>  		case GREP_PATTERN_HEAD:
>  		case GREP_PATTERN_BODY:
> +#ifndef NO_LIBPCRE
> +			if (p->re) {
> +				pcre_free(p->re);
> +				pcre_free(p->extra);
> +			} else {
> +				regfree(&p->regexp);
> +			}
> +#else
>  			regfree(&p->regexp);
> +#endif
>  			break;
>  		default:
>  			break;

> diff --git a/grep.h b/grep.h
> index 06621fe..cd202a9 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -1,6 +1,9 @@
>  #ifndef GREP_H
>  #define GREP_H
>  #include "color.h"
> +#ifndef NO_LIBPCRE
> +#include <pcre.h>
> +#endif /* NO_LIBPCRE */

This part might want to do something like

	#ifndef NO_LIBPCRE
        #include <pcre.h>
        #else
        typedef int pcre; /* dummy */
        typedef int pcre_extra; /* dummy */
        #endif

if it makes it easier to keep the generic part of the code and structure
definition clean.

For example, the "free" part of your patch above then would become

	if (p->pcre_regexp)
        	free_pcre_part(p);
	else
        	regfree(&p->regexp);

with the conditional enclosed entirely within the implementation of
the free_pcre_part() helper function.

Also by doing so, the struct definition below can lose #ifndef, and can become:

	 struct grep_pat {
         ...
        +	pcre *pcre_regexp;
        +	pcre_extra *pcre_extra_info;
	 ...
	 };

> @@ -33,6 +36,10 @@ struct grep_pat {
>  	size_t patternlen;
>  	enum grep_header_field field;
>  	regex_t regexp;
> +#ifndef NO_LIBPCRE
> +	pcre *re;
> +	pcre_extra *extra;
> +#endif /* NO_LIBPCRE */
>  	unsigned fixed:1;
>  	unsigned ignore_case:1;
>  	unsigned word_regexp:1;
> @@ -83,6 +90,7 @@ struct grep_opt {
>  #define GREP_BINARY_TEXT	2
>  	int binary;
>  	int extended;
> +	int pcre;
>  	int relative;
>  	int pathname;
>  	int null_following_name;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]