Re: strbuf API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday 03 September 2007, Pierre Habouzit wrote:
> 
>   I read the recent thread Timo Sirainen raised about string APIs in
> git. ANd I went read the strbuf.[hc] module.
> 
>   I believe that the choice made in that module are wrong and could be
> made better. I actually use to work with a string buffer API (that
> interested readers can look at on [0]), that work almost the same
> (except for the eof flag, but it's trivial to keep), but have
> two significant differences:
> 
>   First, and that's the most important one: the buffer is always NUL
> terminated, after its official "len". That means, in terms of strbuf
> API, that "alloc" is always greater or equal to len+1 to be able to
> store the ending NUL[1]. The advantages are obvious: you can pass the
> buffer to any legacy C string function without any fear of read
> overflow.  strtol comes to mind, and atm, git has to explicitely use
> strbuf_end to put that ending nul to be able to call legacy
> applications. But once done, the NUL is accounted into the string (aka
> it's in "len") which makes it a non C-string (I mean you cannot append
> any more data in it anymore). So current implementations tries to
> workaround an issue (the non systematical NUL-termination) but IMHO the
> wrong way.
> 
>   The other shortcoming is that you cannot tell the buffer "Hey, it's
> very likely that you'll end up being _that_ long. That's why, in some
> parts of the code (see write_tar_entry in archive-tar.c e.g.) the
> programmer actually messes with the buffer allocation, outside from the
> strbuf module, which makes it well, useless. In my API, I have a
> "buffer_ensure" call, that is supposed to do that: "please ensure that
> this buffer still has _this_ amount of free and allocated space to put
> more data".
> 
> 
>   So my question is, do people think I raise a valid point

Yes.

>   , and would 
> patches that would refactor the strbuf module to have those functions,
> and would fix the code that uses strbuf's to interact properly, be
> accepted ?

I don't know. Keep in mind that there is a parallel process (cf. the 
continuation of the "Buffer overflows" thread) to evaluate the Bstring 
library ( http://bstring.sourceforge.net/ ), and possibly substitute that 
for the strbuf "module".

I wouldn't want the work done by you or others to be wasted (depending on 
which solution wins out in the end), so I suggest you take a look at 
bstring and offer up arguments why your solution would be better for git.

>   Also, the efficiency of the buffer module API I use has a lot to do
> with the fact that copying functions, and length tests are inlined in
> the .h, so that the compiler can optimize the ones it already tested 10
> calls before. I'm not sure if this is frowned upon or if it makes sense.
> 
> 
>   [0] http://git.madism.org/?p=pfixtools.git;a=blob;f=buffer.h;hb=HEAD
>       http://git.madism.org/?p=pfixtools.git;a=blob;f=buffer.c;hb=HEAD
> 
>   [1] Of course, ensuring the NUL-termination has a cost, though it's
>       often benign, and for performance-critical places where characters
>       are copied one by one, it's always possible to use an "unsafe"
>       addch (that would not maintain the invariant), and then call an
>       equivalent of strbuf_end (that would not append a \0 like it does
>       now, but just would fix the invariant that for any strbuf,
>       buf->buf[buf->len] == '\0') explicitely. For places where the
>       invariant generate negligible cost (like concatenating two paths
>       parts with a middle '/' e.g.) then we gain safety without even
>       having to think about it.

All the points you raise are valid, but please have in mind that all 
programmers write their own string library at some point. (Writing a string 
library is right up there with "discovering why 'goto' is bad" and "2nd 
system syndrome" on the top list of rites of passage all programmers (need 
to) go through. I'm guessing 99% of the people on this list have written 
their own string library. Why not 100%? Well, Linus have obviously not, 
else we'd be using his already... ;)

I'd like to know how your solution compare with the alternatives (in this 
case bstring): What's the performance compared to bstring? How much thought 
has been given to the usability of the API? What functionality is covered 
(compared to bstring)? In short: Why is your alternative the best for git?


...Johan

-- 
Johan Herland, <johan@xxxxxxxxxxx>
www.herland.net
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux