On 15 August 2017 at 20:43, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Martin Ågren <martin.agren@xxxxxxxxx> writes: > >> If two threads have one freshly initialized string buffer each and call >> strbuf_reset on them at roughly the same time, both threads will be >> writing a '\0' to strbuf_slopbuf. That is not a problem in practice >> since it doesn't matter in which order the writes happen. But >> ThreadSanitizer will consider this a race. >> >> When compiling with GIT_THREAD_SANITIZER, avoid writing to >> strbuf_slopbuf. Let's instead assert on the first byte of strbuf_slopbuf >> being '\0', since it ensures the promised invariant of "buf[len] == >> '\0'". (Writing to strbuf_slopbuf is normally bad, but could become even >> more bad if we stop covering it up in strbuf_reset.) >> >> Signed-off-by: Martin Ågren <martin.agren@xxxxxxxxx> >> --- >> strbuf.h | 12 ++++++++++++ >> 1 file changed, 12 insertions(+) >> >> diff --git a/strbuf.h b/strbuf.h >> index e705b94db..295654d39 100644 >> --- a/strbuf.h >> +++ b/strbuf.h >> @@ -153,7 +153,19 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len) >> /** >> * Empty the buffer by setting the size of it to zero. >> */ >> +#ifdef GIT_THREAD_SANITIZER >> +#define strbuf_reset(sb) \ >> + do { \ >> + struct strbuf *_sb = sb; \ >> + _sb->len = 0; \ >> + if (_sb->buf == strbuf_slopbuf) \ >> + assert(!strbuf_slopbuf[0]); \ >> + else \ >> + _sb->buf[0] = '\0'; \ >> + } while (0) >> +#else >> #define strbuf_reset(sb) strbuf_setlen(sb, 0) >> +#endif >> >> >> /** > > The strbuf_slopbuf[] is a shared resource that is expected by > everybody to stay a holder of a NUL. Even though it is defined as > "char [1]", it in spirit ought to be considered const. And from > that point of view, your new definition that is conditionally used > only when sanitizer is in use _is_ the more correct one than the > current "we do not care if it is slopbuf, we are writing \0 so it > will be no-op anyway" code. > > I wonder if we excessively call strbuf_reset() in the real code to > make your version unacceptably expensive? If not, I somehow feel > that using this version unconditionally may be a better approach. > > What happens when a caller calls "strbuf_setlen(&sb, 0)" on a strbuf > that happens to have nothing and whose buffer still points at the > slopbuf (instead of calling _reset())? Shouldn't your patch fix > that function instead, i.e. something like the following without the > above? Is that make things noticeably and measurably too expensive? Good thinking. There are about 300 users of strbuf_reset and 10 users of strbuf_setlen(., 0) with a literal zero. Obviously, there might be more users which end up setting the length to 0 for some reason or other. So your idea seems the better one. I would assume that whoever resets a buffer is about to add something to it, which should be more expensive, but that's obviously just hand-waving. I'll see if I can find some interesting caller and/or performance numbers. > strbuf.h | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/strbuf.h b/strbuf.h > index 2075384e0b..1a77fe146a 100644 > --- a/strbuf.h > +++ b/strbuf.h > @@ -147,7 +147,10 @@ static inline void strbuf_setlen(struct strbuf *sb, size_t len) > if (len > (sb->alloc ? sb->alloc - 1 : 0)) > die("BUG: strbuf_setlen() beyond buffer"); > sb->len = len; > - sb->buf[len] = '\0'; > + if (sb->buf != strbuf_slopbuf) > + sb->buf[len] = '\0'; > + else > + assert(!strbuf_slopbuf[0]); > } > > /** When writing my patch, I used assert() and figured that with tsan, we're in some sort of "debug"-mode anyway. If we decide to always do the check, would it make sense to do "else if (strbuf_slopbuf[0]) BUG(..);" instead of the assert? Or, if we do prefer the assert, would the performance-worry be moot? Thanks for the feedback.