On 14.01.2018 21:47, Tanu Kaskinen wrote: > On Fri, 2018-01-12 at 21:47 +0100, Georg Chini wrote: >> On 12.01.2018 16:40, Tanu Kaskinen wrote: >>> On Sun, 2017-10-29 at 20:51 +0100, Georg Chini wrote: >>>> For better readability, "pactl list message-handlers" is introduced which >>>> prints a formatted output of "pactl send-message /core list-handlers". >>>> >>>> The patch also adds the function pa_split_message_response() for easy >>>> parsing of the message response string. >>>> --- >>>> >>>> >>> I would also prefer more fine-grained functions than just one single >>> split function. When parsing lists, this split function will >>> unnecessarily malloc the list before mallocing the individual list >>> elements. >> The top level is an implicit list (see my other mail), so a new string >> will only be allocated if a list is passed as an element of another list. >> >> I agree with you that there should be functions for more complex >> structures, but in this special case it does not make sense to me >> because parsing is simple. >> I also think that the low level function does exactly what it should >> do. It allows you to recursively resolve the string and the additional >> few bytes of memory should not be an issue. Consider a more >> complex case. A message returning some top level variables, a >> simple list and a list of structures which themselves contain simple >> lists like that: >> >> {top_var1} {top_var2} {{l1}{l2} ...} {{{struct1_var1}{struct1_var2} >> {{struct1_l1}{struct1_l2} ...} {struct1_var3}}{{struct2_var1}{struct2_var2} >> {{struct2_l1}{struct2_l2} ...} {struct2_var3}} ...} {top_var3} >> >> The first pass would return the following strings: >> top_var1 >> >> top_var2 >> >> {l1}{l2} ... >> >> {{struct1_var1}{struct1_var2}{{struct1_l1}{struct1_l2} ...} {struct1_var3}} >> {{struct2_var1}{struct2_var2}{{struct2_l1}{struct2_l2} ...} >> {struct2_var3}} ... >> >> top_var3 >> >> So top level is resolved. The simple list can be resolved in the 2nd pass. >> The field of structures will be split in single structures during the 2nd >> pass like that: >> >> {struct1_var1}{struct1_var2}{{struct1_l1}{struct1_l2} ...} {struct1_var3} >> >> {struct2_var1}{struct2_var2}{{struct2_l1}{struct2_l2} ...} {struct2_var3} >> >> ... >> >> Now we can have a function which parses the struct and uses the >> low level function again twice to do so. >> >> I think we should introduce those high level parsing functions whenever >> they are first needed. For me, functions like >> string_to_array_of_{int|float|...}() >> or string_to_struct_xyz() would make sense. > We should have specialized parsing functions also for the simple cases, > like reading a single int or float. We need to specify exactly how e.g. > floats are serialized, and we shouldn't burden every client with the > responsibility of implementing our float serialization rules. Not only > is it more work for them, the likelihood of not following our spec > exactly is very high. I am not against having such parsing functions. You are right, they could avoid allocating and de-allocating a string when a number or a boolean is retrieved. But they are not needed for this patch because it only deals with strings. > > Returning to the topic of mallocs: allocating and freeing memory from > the heap are quite heavy operations, and your approach will do a lot of > those operations. Even reading a simple integer will, I suppose, > involve first allocating a new string. I can't say for sure that the > performance penalty makes any practical difference, but my intuition > says that we should use the approach that I proposed that requires far > fewer heap memory allocations. I can accept the argument that my solution unnecessarily allocates strings in some situations. Though I do not think it has much practical impact, most of it can be avoided by using special functions to retrieve numbers or booleans. I am also quite sure that a function similar to the *_split_in_place() functions can be implemented to completely avoid the allocation of unnecessary strings. > >>> I already provided an example about how I'd like the parsing to work. >>> Apparently you didn't like that for some reason? I'll copy the example >>> here again: >>> >>> const char *state = message_parameters; >>> >>> if (pa_message_read_start_list(&state) < 0) >>> fail("Expected a list."); >>> >>> while (pa_message_read_end_list(&state) < 0) { >>> char *path; >>> char *description; >>> >>> if (pa_message_read_path(&state, &path) < 0) >>> fail("Expected a path."); >>> if (pa_message_read_string(&state, &description) < 0) >>> fail("Expected a string.") >>> >>> /* Do something with path and description. */ >>> >>> pa_xfree(description); >>> pa_xfree(path); >>> } >> As said above, I would prefer the single parsing function until >> the need for more complex functions arises. We should not be >> too fine grained, this will lead to confusion which function must >> be used when. > Can you elaborate, what risk of confusion do you see? From my point of view, your solution has several issues. 1) The example above is not extensible. What is pa_message_read_end_list() supposed to do? Skip to the next element in the list indicated by { and return zero if } is encountered instead? If yes, this would not work if you add an item to the inner list, because at the next iteration it would go to the additional element instead of the next list. On the other hand, If the function is supposed to skip to the end of a list, you cannot use it to iterate over list elements. To make your solution extensible you would at least need three functions: pa_message_find_list_start() - find the beginning of a list pa_message_find_list_end() - skip to the end of a list pa_message_find_list_element() - find the next element in a list All three functions need to handle escaping, nesting and error checking. Your loop would look something like pa_message_find_list_start(&state); while (pa_message_find_list_element(&state)) {     pa_message_find_list_start(&state);     read path     read description     pa_message_find_list_end(&state); } This looks rather complicated compared to while ((element = pa_split_message_response(response, &state))) {    read path    read description    pa_xfree(element); } and I wonder if the additional overhead does not outweigh the allocation and de-allocation of a string. Additionally, your approach might easily lead to problems, because, if the find_list_end() call within the loop is forgotten, it will work with the original message response but will break on extension. 2) You are forced to process the elements in the order they are delivered, taking care of all the nesting. This has two disadvantages: a) Because you do not know if the nesting brackets are closed correctly, you may continue processing elements after a missing bracket. Consider the following example: {{x1} {x2} {x3}} {y1} {y2}. If you forget to close the list like {{x1} {x2} {x3} {y1} {y2}, your approach would continue processing y1 and y2 as part of the list before the error is encountered. split_message_response() however would complain before any of the elements is processed because it searches for the closing bracket and thereby does a consistency check of each element that is returned. b) If you have nested lists, you need to handle them in nested loops like this: pa_message_find_list_start(&state); while (pa_message_find_list_element(&state)) {     pa_message_find_list_start(&state); while (pa_message_find_list_element(&state)) {           ...    } } With my solution it would look like: outer_list = pa_split_message_response(response, &state); while ((inner_list = pa_split_message_response(outer_list, &state2))) {    ... } which again is somewhat simpler and better to read especially if more complex structures are involved. As said above it has the additional advantage that outer_list is already checked for consistency before the while-loop is entered. c) For your solution to work correctly, outermost brackets would be required while they are optional with my code. This is not important, but I thought I mention it anyway. If my approach will be followed, the big advantage of adding outer brackets is that the initial split_message_response() call would do a consistency check of the whole string before anything is processed. 3) To answer your original question, in your example it is a bit confusing that there is a read_path() and a read_description() function. Both are strings, so why have different functions? The functions we are adding should not be too specialized. > > At this point I'm not convinced by your reasoning, and I feel pretty > strongly about the parsing design. If you want to continue with the > "single split function" approach, getting Arun on your side is likely > going to be needed for me to give up on this. > I also feel pretty strongly about my approach, so if the arguments above do not convince you, we should involve Arun. As said above, I don't mind having more specialized split functions, but I think they need not be part of this patch.