On 02/09/07, Johan Herland <johan@xxxxxxxxxxx> wrote: > On Friday 31 August 2007, Junio C Hamano wrote: > > The API needs to justify itself to convince the people who needs > > to learn and adjust to that the benefit far outweighes deviation > > from better known patterns, and I do not see that happening in > > Timo's patch. > > So in general, git people seem to be saying that: > > 1. Yes, we agree that the C string library suX0rs badly. > > 2. There are more than 0 string manipulation bugs (e.g. buffer overflows) in > git. The number may be small or large, but I have yet to see anyone claim > it's _zero_. > > 3. Timo's patches (in their current form) are not the way to go, because of > non-standard API, implementation problems, whatever... > > So why does the discussion end there? Lukas proposed an interesting > alternative in "The Better String Library" ( > http://bstring.sourceforge.net/ ). Why has there been lots of bashing on > Timo's efforts, but no critique of bstring? I'd be very keen to know what > the git developers think of it. AFAICS, it seems to fulfill at least _some_ > of the problems people find in Timo's patches. Specifically, it claims: > > - High performance (better than the C string library) > - Simple usage Performing a brief look at the documentation, the bstring library looks promising. It looks like it has an allocate and grow internal buffers on demand policy. This is similar to what the C++ std::basic_string does, as well as the string helpers in the Boost version of Jam (written in C). This hides the buffer management from the user of the library, rather than obfuscating it like in Timo's patch. The API defined in the documentation is well thought out and extensive, moreso than in the efforts by Timo and others. It has the traditional C API, along with other API found in other string libraries (such as split and join). I am not sure how much of git could make use of these, but they have the pontential to simplify some areas of the codebase. Looking at the documentation, it is clear that this is a well thought out library, both from the problems/security issues of the C library and to how it compares with other string libraries. As well as covering buffer overflow, it also deals with things like integer overflow. They have also done performance tests comparing the bstring library to the C API and C++ std::string. With the C API comparison, the library performs about 10% slower for string assignment, but other areas don't have a slowdown. In fact, string concatenation is _considerably_ improved, something that will help git performance. I suspect (but have not verified) that the slowdown on assignment is due to buffer allocation. > I'd also say it's probably more widely used than Timo's patches. Which is good, as this means that along with the tests in the library, it will be more stable and less likely to be buggy than something that is written from scratch. > If the only response to Timo's highlighting of string manipulation problems > in git, is for us to flame his patches and leave it at that, then I have no > choice but to agree with him in that security does not seem to matter to > us. I would not like to see that happen. It seems that the bstring library will help git in more ways than security, by improving string concatenation performance and giving a richer string API without sacrificing performance (except where noted) and code clarity. It would be interesting to see how the 10% performance drop on string assignment impacts git performance, when balanced with the drastic (92x in the performance table) increase on string concatenation. The only major issue that I can see with bstring is that it does not have a wchar_t version, but git is using chars internally, so this is not a problem for git. - Reece - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html