Re: [PATCH 1/4] Add a new function, string_list_split_in_place()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/10/2012 07:47 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
> 
>> ...  Consider something like
>>
>>     struct string_list *split_file_into_words(FILE *f)
>>     {
>>         char buf[1024];
>>         struct string_list *list = new string list;
>>         list->strdup_strings = 1;
>>         while (not EOF) {
>>             read_line_into_buf();
>>             string_list_split_in_place(list, buf, ' ', -1);
>>         }
>>         return list;
>>     }
> 
> That is a prime example to argue that string_list_split() would make
> more sense, no?  The caller does _not_ mind if the function mucks
> with buf, but the resulting list is not allowed to point into buf.
> 
> In such a case, the caller shouldn't have to _care_ if it wants to
> allow buf to be mucked with; it is already asking that the resulting
> list _not_ point into buf by setting strdup_strings (by the way,
> that is part of the function input, so think of it like various *opt
> variables passed into functions to tweak their behaviour).  If the
> implementation can do so without sacrificing performance (and in
> this case, as you said, it can), it should take "const char *buf".

You're right, I was thinking that a caller of
string_list_split_in_place() could choose to remain ignorant of whether
strdup_strings is set, but this is incorrect because it needs to know
whether to do its own memory management of the strings that it passes
into string_list_append().

> So it appears to me that sl_split_in_place(), if implemented, should
> be kept as a special case for performance-minded callers that have
> full control of the lifetime rules of the variables they use, can
> set strdup_strings to false, and can let buf modified in place, and
> can accept list that point into buf.

OK, so the bottom line would be to have two versions of the function.
One takes a (const char *) and *requires* strdup_strings to be set on
its input list:

int string_list_split(struct string_list *list, const char *string,
		      int delim, int maxsplit)
{
	assert(list->strdup_strings);
	...
}

The other takes a (char *) and modifies it in-place, and maybe even
requires strdup_strings to be false on its input list:

int string_list_split_in_place(struct string_list *list, char *string,
			       int delim, int maxsplit)
{
	/* not an error per se but a strong suggestion of one: */
	assert(!list->strdup_strings);
	...
}

(The latter (modulo assert) is the one that I have implemented, but it
might not be needed immediately.)  Do you agree?

>>>> + * Examples:
>>>> + *   string_list_split_in_place(l, "foo:bar:baz", ':', -1) -> ["foo", "bar", "baz"]
>>>> + *   string_list_split_in_place(l, "foo:bar:baz", ':', 1) -> ["foo", "bar:baz"]
>>>> + *   string_list_split_in_place(l, "foo:bar:", ':', -1) -> ["foo", "bar", ""]
>>>
>>> I would find it more natural to see a sentinel value against
>>> "positive" to be 0, not -1.  "-1" gives an impression as if "-2"
>>> might do something different from "-1", but Zero is a lot more
>>> special.
>>
>> You have raised a good point and I think there is a flaw in the API, but
>> I'm not sure I agree with you what the flaw is...
>>
>> The "maxsplit" argument limits the number of times the string should be
>> split.  I.e., if maxsplit is set, then the output will have at most
>> (maxsplit + 1) strings.
> 
> So "do not split, just give me the whole thing" would be maxsplit == 0
> to split into (maxsplit+1) == 1 string.  I think we are in agreement
> that your "-1" does not make any sense, and your documentation that
> said "positive" is the saner thing to do, no?

No.  I think that my handling of maxsplit=0 was incorrect but that we
should continue using -1 as the special value.

I see maxsplit=0 as a legitimate degenerate case meaning "split into 1
string".  Granted, it would only be useful in specialized situations
[1].  Moreover, "-1" makes a much more obvious special value than "0";
somebody reading code with maxsplit=-1 knows immediately that this is a
special value, whereas the handling of maxsplit=0 isn't quite so
blindingly obvious unless the reader knows the outcome of our current
discussion :-)

Therefore I still prefer treating only negative values of maxsplit to
mean "unlimited" and fixing maxsplit=0 as described.  But if you insist
on the other convention, let me know and I will change it.

Michael

[1] A case I can think of would be parsing a format like

    NUMPARENTS [PARENT...] SUMMARY

where "string_list_split(list, rest_of_line, ' ', numparents)" does the
right thing even if numparents==0.

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]