On Mon, Jun 02, 2008 at 10:59:51PM +0000, Miklos Vajna wrote: > +strbufs has some invariants that are very important to keep in mind: > + > +. The `->buf` member is always malloc-ed, hence strbuf's can be used to > + build complex strings/buffers whose final size isn't easily known. > ++ > +It is *not* legal to copy the `->buf` pointer away. `strbuf_detach()` is > +the operation that detachs a buffer from its shell while keeping the > +shell valid wrt its invariants. > + > +. The `->buf` member is a byte array that has at least `->len + 1` bytes > + allocated. The extra byte is used to store a `'\0'`, allowing the > + `->buf` member to be a valid C-string. Every strbuf function ensure this > + invariant is preserved. > ++ I wouldn't write that part this way at all: I have a personal implementation of things that look like strbufs that are able to use an `alloca`-ed buffers and deals with overflows with malloc's, possibility to stream and so on (implemented with skips, meaning that what is the ->buf member is, isn't always the real start of the allocated memory block). And I don't think it's a good idea to carve such informations into stone. So what I'd says in summary is: (1) it is totally safe to touch anything in the buffer pointed by the buf member between the index 0 and buf->len excluded. (1b) what you write later: it's also possible to write after buf->len if not after strbuf_avail() _BUT_ then you have when you're done the task to reset the (2) invariant yourself, using strbuf_setlen(). (2) ->buf[->len] == '\0' holds _ALL TIME_. (3) ->buf is never ever NULL so it can be used in any usual C string ops safely. (4) do NOT assume anything on what ->buf really is (allocated memory or not e.g.), use strbuf_detach to unwrap a memory buffer from its strbuf shell in a safe way. That is the sole supported way. This will give you a malloced buffer that you can later free(). > +* Life cycle > + > +`strbuf_init`:: > + > + Initializes the structure. The second parameter can be zero or a bigger > + number to allocate memory, in case you want to prevent further reallocs. I'd add that it is _MANDATORY_ to initialize strbufs, and that a static allocation (for global variables e.g.) can be done using the STRBUF_INIT static initializer. > +`strbuf_release`:: > + > + Releases a string buffer and the memory it used. You should not use the > + string buffer after using this function, unless you initialize it again. Actually this is wrong because strbuf_release performs a new init since init allocates 0 memory and that it's idiot-proof. But it could be changed in the future and it should not be relied upon. > +`strbuf_detach`:: > + > + Detaches the string from the string buffer. The function returns a > + pointer to the old string and empties the buffer. Not really strbuf_detach unwraps the embedded buffer for sure, but it doesn't "empties" the buffer, strbuf_detach is like strbuf_release: after a release, strbuf should be init-ed again (even if for now strbuf_release does so). > +`strbuf_attach`:: > + > + Attaches a string to a buffer. You should specify the string to attach, > + the current length of the string and the amount of allocated memory. In addition to what Johannes said: size must be > len. Because the string you pass is supposed to be a NUL-terminated string. > +* Related to the size of the buffer > + > +`strbuf_avail`:: > + > + Determines the amount of allocated but not used memory. > + > +`strbuf_grow`:: > + > + Allocated extra memory for the buffer. I'd put that this way: ensure that at least this amount of available memory is available. This is used when you know a typical size for what you will do and want to avoid repetitive automatic resize of the underlying buffer. This is never a needed operation, but can be critical for performance in some cases. > +`strbuf_setlen`:: > + > + Sets the length of the buffer to a given value. This function does NOT allocate new memory, so you should not perform a strbuf_setlen to a length that is larger than sb->len + strbuf_avail(sb). strbuf_setlen is just meant as a "please fix invariants from this strbuf I just messed with)" > +`strbuf_add`:: > + > + Add data of given length to the buffer. > + > +`strbuf_addstr`:: > + > + Add a NULL-terminated string to the buffer. Please use NUL, '\0' is NUL (as in its ascii name), NULL is (void *)0. In addition to that, I'd say that strbuf_addstr will ALWAYS be implemented as an inline or a macro that expands to: strbuf_add(..., s, strlen(s)) Meaning that this is efficient to write things like: strbuf_addstr(sb, "immediate string"). > +`strbuf_expand`:: This function is a pretty printer that expands magic formats string thanks to callbacks, so that it's done in a generic way. It's what is used to generate git-log e.g. I'm not its author, so I'm not really best placed to describe it. > +`strbuf_fread`:: > + > + Read a given size of data from a FILE* pointer to the buffer. > + > +`strbuf_read`:: > + > + Read the contents of a given file descriptor. The third argument can be > + used to give a hint about the file, to avoid reallocs. > + > +`strbuf_read_file`:: > + > + Read the contents of a file, specified by its path. The third argument > + can be used to give a hint about the file, to avoid reallocs. > + > +`strbuf_getline`:: > + > + Read a line from a FILE* pointer. The second argument specifies the line > + terminator character, like `'\n'`. > + For all: the buffer is rewinded if the read fails. If -1 is returned, errno must be consulted, like you would do for read(3). On Tue, Jun 03, 2008 at 12:00:33AM +0000, Johannes Schindelin wrote: > Hi, > > I am no author of strbuf, but hey, I thought I'd just give you a few > comments... > > In general, I'd rather leave the "->" from the members, since you have > many instances where you access them with ".". > > On Tue, 3 Jun 2008, Miklos Vajna wrote: > > > diff --git a/Documentation/technical/api-strbuf.txt b/Documentation/technical/api-strbuf.txt > > index a52e4f3..3879e0e 100644 > > --- a/Documentation/technical/api-strbuf.txt > > +++ b/Documentation/technical/api-strbuf.txt > > @@ -1,6 +1,175 @@ > > strbuf API > > ========== > > > > -Talk about <strbuf.h> > > +strbuf's can be use in many ways: as a byte array, or to store arbitrary > > +long, overflow safe strings. > > I think that you should not suggest using strbufs as byte array, even if > that is certainly possible. Rather, you should say something like: > > An strbuf is NUL terminated for convenience, but no function in > the strbuf API actually relies on the string being free of NULs. ACK. I'd add the fact that strbuf are meant to be used with all the usual C string and memory APIs. Given that the length of the buffer is known, it's often better to use the mem* functions than a str* one (memchr vs. strchr e.g.). Though, one has to be careful about the fact that str* functions often stop on NULs and that strbufs may have embedded NULs. > > +strbufs has some invariants that are very important to keep in mind: > > + > > +. The `->buf` member is always malloc-ed, hence strbuf's can be used to > > + build complex strings/buffers whose final size isn't easily known. > > Is this true? I thought the initial string is empty, but not alloc'ed. > > So I'd rather have something like > > The "buf" member is never NULL, so you can safely strcmp() it. > > I'd like to see a comment that strbuf's _have_ to be initialized either by > strbuf_init() or by "= STRBUF_INIT" before the invariants, though. Well I'd go even further like I proposed above. > > +`strbuf_attach`:: > > + > > + Attaches a string to a buffer. You should specify the string to attach, > > + the current length of the string and the amount of allocated memory. > > .... This string _must_ be malloc()ed, and after attaching, the pointer > cannot be relied upon anymore, and neither be free()d directly. ACK. -- ·O· Pierre Habouzit ··O madcoder@xxxxxxxxxx OOO http://www.madism.org
Attachment:
pgpnvhKxGMkiC.pgp
Description: PGP signature