Re: [PATCH 4/5] strbuf_reset: don't write to slopbuf with ThreadSanitizer

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 15 Aug 2017 11:43:48 -0700

Martin Ågren <martin.agren@xxxxxxxxx> writes:

> If two threads have one freshly initialized string buffer each and call
> strbuf_reset on them at roughly the same time, both threads will be
> writing a '\0' to strbuf_slopbuf. That is not a problem in practice
> since it doesn't matter in which order the writes happen. But
> ThreadSanitizer will consider this a race.
>
> When compiling with GIT_THREAD_SANITIZER, avoid writing to
> strbuf_slopbuf. Let's instead assert on the first byte of strbuf_slopbuf
> being '\0', since it ensures the promised invariant of "buf[len] ==
> '\0'". (Writing to strbuf_slopbuf is normally bad, but could become even
> more bad if we stop covering it up in strbuf_reset.)
>
> Signed-off-by: Martin Ågren <martin.agren@xxxxxxxxx>
> ---
>  strbuf.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/strbuf.h b/strbuf.h
> index e705b94db..295654d39 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -153,7 +153,19 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len)
>  /**
>   * Empty the buffer by setting the size of it to zero.
>   */
> +#ifdef GIT_THREAD_SANITIZER
> +#define strbuf_reset(sb)						\
> +	do {								\
> +		struct strbuf *_sb = sb; 				\
> +		_sb->len = 0;						\
> +		if (_sb->buf == strbuf_slopbuf)				\
> +			assert(!strbuf_slopbuf[0]);			\
> +		else							\
> +			_sb->buf[0] = '\0';				\
> +	} while (0)
> +#else
>  #define strbuf_reset(sb)  strbuf_setlen(sb, 0)
> +#endif
>  
>  
>  /**

The strbuf_slopbuf[] is a shared resource that is expected by
everybody to stay a holder of a NUL.  Even though it is defined as
"char [1]", it in spirit ought to be considered const.  And from
that point of view, your new definition that is conditionally used
only when sanitizer is in use _is_ the more correct one than the
current "we do not care if it is slopbuf, we are writing \0 so it
will be no-op anyway" code.

I wonder if we excessively call strbuf_reset() in the real code to
make your version unacceptably expensive?  If not, I somehow feel
that using this version unconditionally may be a better approach.

What happens when a caller calls "strbuf_setlen(&sb, 0)" on a strbuf
that happens to have nothing and whose buffer still points at the
slopbuf (instead of calling _reset())?  Shouldn't your patch fix
that function instead, i.e. something like the following without the
above?  Is that make things noticeably and measurably too expensive?

Thanks.

 strbuf.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/strbuf.h b/strbuf.h
index 2075384e0b..1a77fe146a 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -147,7 +147,10 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len)
 	if (len > (sb->alloc ? sb->alloc - 1 : 0))
 		die("BUG: strbuf_setlen() beyond buffer");
 	sb->len = len;
-	sb->buf[len] = '\0';
+	if (sb->buf != strbuf_slopbuf)
+		sb->buf[len] = '\0';
+	else
+		assert(!strbuf_slopbuf[0]);
 }
 
 /**