On 09/10/2012 07:47 AM, Junio C Hamano wrote: > Michael Haggerty <mhagger@xxxxxxxxxxxx> writes: > >> ... Consider something like >> >> struct string_list *split_file_into_words(FILE *f) >> { >> char buf[1024]; >> struct string_list *list = new string list; >> list->strdup_strings = 1; >> while (not EOF) { >> read_line_into_buf(); >> string_list_split_in_place(list, buf, ' ', -1); >> } >> return list; >> } > > That is a prime example to argue that string_list_split() would make > more sense, no? The caller does _not_ mind if the function mucks > with buf, but the resulting list is not allowed to point into buf. > > In such a case, the caller shouldn't have to _care_ if it wants to > allow buf to be mucked with; it is already asking that the resulting > list _not_ point into buf by setting strdup_strings (by the way, > that is part of the function input, so think of it like various *opt > variables passed into functions to tweak their behaviour). If the > implementation can do so without sacrificing performance (and in > this case, as you said, it can), it should take "const char *buf". You're right, I was thinking that a caller of string_list_split_in_place() could choose to remain ignorant of whether strdup_strings is set, but this is incorrect because it needs to know whether to do its own memory management of the strings that it passes into string_list_append(). > So it appears to me that sl_split_in_place(), if implemented, should > be kept as a special case for performance-minded callers that have > full control of the lifetime rules of the variables they use, can > set strdup_strings to false, and can let buf modified in place, and > can accept list that point into buf. OK, so the bottom line would be to have two versions of the function. One takes a (const char *) and *requires* strdup_strings to be set on its input list: int string_list_split(struct string_list *list, const char *string, int delim, int maxsplit) { assert(list->strdup_strings); ... } The other takes a (char *) and modifies it in-place, and maybe even requires strdup_strings to be false on its input list: int string_list_split_in_place(struct string_list *list, char *string, int delim, int maxsplit) { /* not an error per se but a strong suggestion of one: */ assert(!list->strdup_strings); ... } (The latter (modulo assert) is the one that I have implemented, but it might not be needed immediately.) Do you agree? >>>> + * Examples: >>>> + * string_list_split_in_place(l, "foo:bar:baz", ':', -1) -> ["foo", "bar", "baz"] >>>> + * string_list_split_in_place(l, "foo:bar:baz", ':', 1) -> ["foo", "bar:baz"] >>>> + * string_list_split_in_place(l, "foo:bar:", ':', -1) -> ["foo", "bar", ""] >>> >>> I would find it more natural to see a sentinel value against >>> "positive" to be 0, not -1. "-1" gives an impression as if "-2" >>> might do something different from "-1", but Zero is a lot more >>> special. >> >> You have raised a good point and I think there is a flaw in the API, but >> I'm not sure I agree with you what the flaw is... >> >> The "maxsplit" argument limits the number of times the string should be >> split. I.e., if maxsplit is set, then the output will have at most >> (maxsplit + 1) strings. > > So "do not split, just give me the whole thing" would be maxsplit == 0 > to split into (maxsplit+1) == 1 string. I think we are in agreement > that your "-1" does not make any sense, and your documentation that > said "positive" is the saner thing to do, no? No. I think that my handling of maxsplit=0 was incorrect but that we should continue using -1 as the special value. I see maxsplit=0 as a legitimate degenerate case meaning "split into 1 string". Granted, it would only be useful in specialized situations [1]. Moreover, "-1" makes a much more obvious special value than "0"; somebody reading code with maxsplit=-1 knows immediately that this is a special value, whereas the handling of maxsplit=0 isn't quite so blindingly obvious unless the reader knows the outcome of our current discussion :-) Therefore I still prefer treating only negative values of maxsplit to mean "unlimited" and fixing maxsplit=0 as described. But if you insist on the other convention, let me know and I will change it. Michael [1] A case I can think of would be parsing a format like NUMPARENTS [PARENT...] SUMMARY where "string_list_split(list, rest_of_line, ' ', numparents)" does the right thing even if numparents==0. -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html