Re: [PATCH v3 1/5] version: refactor strbuf_sanitize()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christian Couder <christian.couder@xxxxxxxxx> writes:

> +/*
> + * Trim and replace each character with ascii code below 32 or above
> + * 127 (included) using a dot '.' character. Useful for sending
> + * capabilities.
> + */
> +void strbuf_sanitize(struct strbuf *sb);

I am not getting "Useful for sending capabilities" here, and feel
that it is somewhat an unsubstantiated claim.  If some information
is going to be transferred (which the phrase "sending capabilities"
hints), I'd expect that we try as hard as possible not to lose
information, but redact-non-ASCII is the total opposite of "not
losing information".

> diff --git a/version.c b/version.c
> index 41b718c29e..951e6dca74 100644
> --- a/version.c
> +++ b/version.c
> @@ -24,15 +24,10 @@ const char *git_user_agent_sanitized(void)
>  
>  	if (!agent) {
>  		struct strbuf buf = STRBUF_INIT;
> -		int i;
>  
>  		strbuf_addstr(&buf, git_user_agent());
> -		strbuf_trim(&buf);
> -		for (i = 0; i < buf.len; i++) {
> -			if (buf.buf[i] <= 32 || buf.buf[i] >= 127)
> -				buf.buf[i] = '.';
> -		}
> -		agent = buf.buf;
> +		strbuf_sanitize(&buf);
> +		agent = strbuf_detach(&buf, NULL);
>  	}
>  
>  	return agent;

This is very faithful rewrite of the original.  The original had a
strbuf on stack, and after creating user-agent string in it, a
function scope static variable "agent" is made to point at it and
then the stack the strbuf was on is allowed to go out of scope.
Since the variable "agent" is holding onto the piece of memory, the
leak checker does not complain about anything.  The rewritten
version is leak-free for exactly the same reason, but because it
calls strbuf_detach() before the strbuf goes out of scope to
officially transfer the ownership to the variable "agent", it tells
what is going on to readers a lot more clearly.

Nicely done.

By the way, as we are trimming, I am very very much tempted to
squish a run of non-ASCII bytes into one dot, perhaps like

	void redact_non_printables(struct strbuf *sb)
	{
		size_t dst = 0;
                int skipped = 0;

        	strbuf_trim(sb);
		for (size_t src = 0; src < sb->len; src++) {
                	int ch = sb->buf[src];
			if (ch <= 32 && 127 <= ch) {
                                if (skipped)
                                	continue;
                        	ch = '.';
			}
                        sb->buf[dst++] = ch;
                        skipped = (ch == '.');
		}
	}

or even without strbuf_trim(), which would turn any leading or
trailing run of whitespaces into '.'.

But that is an improvement that can be easily done on top after the
dust settles and better left as #leftoverbits material.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux