[PATCH 5/5] pactl: Implement list message-handlers

georg@xxxxxxxx (Georg Chini) · Fri, 19 Jan 2018 14:23:09 +0100

On 18.01.2018 23:20, Tanu Kaskinen wrote:
> On Wed, 2018-01-17 at 16:12 +0100, Georg Chini wrote:
>> On 14.01.2018 21:47, Tanu Kaskinen wrote:ry from
>>> the heap are quite heavy operations, and your approach will do a lot of
>>> those operations. Even reading a simple integer will, I suppose,
>>> involve first allocating a new string. I can't say for sure that the
>>> performance penalty makes any practical difference, but my intuition
>>> says that we should use the approach that I proposed that requires far
>>> fewer heap memory allocations.
>> I can accept the argument that my solution unnecessarily allocates
>> strings in some situations. Though I do not think it has much practical
>> impact, most of it can be avoided by using special functions to retrieve
>> numbers or booleans. I am also quite sure that a function similar to the
>> *_split_in_place() functions can be implemented to completely avoid
>> the allocation of unnecessary strings.
> An in-place split function seems feasible indeed.
>
>>>>> I already provided an example about how I'd like the parsing to work.
>>>>> Apparently you didn't like that for some reason? I'll copy the example
>>>>> here again:
>>>>>
>>>>> const char *state = message_parameters;
>>>>>
>>>>> if (pa_message_read_start_list(&state) < 0)
>>>>>        fail("Expected a list.");
>>>>>
>>>>> while (pa_message_read_end_list(&state) < 0) {
>>>>>        char *path;
>>>>>        char *description;
>>>>>
>>>>>        if (pa_message_read_path(&state, &path) < 0)
>>>>>            fail("Expected a path.");
>>>>>        if (pa_message_read_string(&state, &description) < 0)
>>>>>            fail("Expected a string.")
>>>>>
>>>>>        /* Do something with path and description. */
>>>>>
>>>>>        pa_xfree(description);
>>>>>        pa_xfree(path);
>>>>> }
>>>> As said above, I would prefer the single parsing function until
>>>> the need for more complex functions arises. We should not be
>>>> too fine grained, this will lead to confusion which function must
>>>> be used when.
>>> Can you elaborate, what risk of confusion do you see?
>>   From my point of view, your solution has several issues.
>>
>> 1) The example above is not extensible. What is pa_message_read_end_list()
>> supposed to do? Skip to the next element in the list indicated by { and
>> return
>> zero if } is encountered instead? If yes, this would not work if you add
>> an item
>> to the inner list, because at the next iteration it would go to the
>> additional
>> element instead of the next list. On the other hand, If the function is
>> supposed
>> to skip to the end of a list, you cannot use it to iterate over list
>> elements.
> pa_message_read_end_list() is supposed to consume the closing brace
> (and any random characters, since those are now allowed), nothing else.
> If it encounters an opening brace or end-of-string, it's supposed to
> fail.
>
> My example does not work with the current list-handlers message, that's
> true.
>
>> To
>> make your solution extensible you would at least need three functions:
>>
>> pa_message_find_list_start() - find the beginning of a list
>> pa_message_find_list_end() - skip to the end of a list
>> pa_message_find_list_element() - find the next element in a list
>>
>> All three functions need to handle escaping, nesting and error checking.
> I don't know why you added "find" to the function names. You're
> probably guessing wrong the semantics I intended to have with
> pa_message_list_start() and pa_message_list_end().
>
> pa_message_read_list_start() should only consume the next opening brace
> and any whitespace preceding it (by "whitespace" I mean any ignorable
> characters). If it encounters a closing brace or end-of-string, it
> should fail. There's no need for escaping or nesting handling.
>
> pa_message_read_list_end() is explained above.
>
> A "skip-to-end" function is indeed needed, and it will have to handle
> escaping and nesting.
>
> Here's an updated example that should work with the current list-
> handlers (with an "implicit" top-level list):
>
> while (pa_message_read_list_start(&state) > 0) {
>      read path
>      read description
>      pa_message_skip_rest_of_list(&state) /* Consumes the list closing brace and everything preceding it. */
> }
>
> if (!pa_message_eof(&state)) {
>      /* This can happen if an unexpected closing brace is encountered. */
>      fail("Malformed data");
> }
>
> As you can see, the updated example is exactly the same length as your
> proposal, except for the pa_message_eof() check that you didn't have,
> but you should have a similar check, because otherwise you don't know
> if the parsing loop ended because of end-of-string or malformed input.

I do not need the eof check because my function reads a complete
element at a time and fails if there is any inconsistency. So your code
is still much longer.
How do you distinguish between the cases where the end of string
is hit by pa_message_read_start() which is OK because then we
finished processing and the case where the end of the string is
hit by pa_message_skip_rest_of_list() which is not OK because it
means a closing bracket is missing?
Also you need to handle the case where the end of a list does not
match the end of the string because there are further elements
behind it.

>
>> Your
>> loop would look something like
>>
>> pa_message_find_list_start(&state);
>> while (pa_message_find_list_element(&state)) {
>>        pa_message_find_list_start(&state);
>>        read path
>>        read description
>>        pa_message_find_list_end(&state);
>> }
>>
>> This looks rather complicated compared to
>>
>> while ((element = pa_split_message_response(response, &state))) {
>>       read path
>>       read description
>>       pa_xfree(element);
>> }
>>
>> and I wonder if the additional overhead does not outweigh the allocation
>> and de-allocation of a string. Additionally, your approach might easily lead
>> to problems, because, if the find_list_end() call within the loop is
>> forgotten,
>> it will work with the original message response but will break on extension.
> My updated example won't work with the original message response,
> because pa_message_skip_rest_of_list() is required to consume the list
> closing brace. If that is forgotten, then pa_message_read_list_start()
> will fail, the loop will terminate, and the !pa_message_eof() will
> trigger failure.
>
>> 2) You are forced to process the elements in the order they are delivered,
>> taking care of all the nesting. This has two disadvantages:
>>
>> a) Because you do not know if the nesting brackets are closed correctly, you
>> may continue processing elements after a missing bracket. Consider the
>> following example: {{x1} {x2} {x3}} {y1} {y2}. If you forget to close the
>> list like {{x1} {x2} {x3} {y1} {y2}, your approach would continue processing
>> y1 and y2 as part of the list before the error is encountered.
>>
>> split_message_response() however would complain before any of the elements
>> is processed because it searches for the closing bracket and thereby does a
>> consistency check of each element that is returned.
> You're right, but I don't see why this is such a bad thing. The error
> will be caught eventually.

If you only parse the data it is not an issue but if you are already 
using the
values in the parsing loop this might cause problems. Let me make a (not
very realistic) example: We have a 5.1 output and the x1 to x3 are used
to change the volume of the first three channels. Now you would also
change the volume of the 4th and 5th channel to some random value
before you hit an error.

>
>> b) If you have nested lists, you need to handle them in nested loops
>> like this:
>>
>> pa_message_find_list_start(&state);
>> while (pa_message_find_list_element(&state)) {
>>
>>        pa_message_find_list_start(&state);
>> while (pa_message_find_list_element(&state)) {
>>              ...
>>       }
>> }
>>
>> With my solution it would look like:
>>
>> outer_list = pa_split_message_response(response, &state);
>>
>> while ((inner_list = pa_split_message_response(outer_list, &state2))) {
>>       ...
>> }
>>
>> which again is somewhat simpler and better to read especially if more
>> complex structures are involved. As said above it has the additional
>> advantage that outer_list is already checked for consistency before the
>> while-loop is entered.
> I don't understand. What data are we parsing? Why does the first
> example need nested loops but the second doesn't? If the inner list is
> an array that needs iteration, then you should loop in the code that
> you replaced with "..." in the second example. If the inner list is a
> structure like what we have to deal with list-handlers, then the first
> example doesn't need the inner loop, it can read the struct members in
> the outer loop.

Let's say I am parsing a list of heterogeneous elements where some
elements contain sub-elements.

With my code, I can do something like

i = 0;
while ((element[i] = pa_split_message_response(response, &state)))
 Â  Â Â  i++;

parse element[3]
parse element[5]

With your code you would have to do the parsing of the more complex
elements within the loop. The ability to take things out of the loop in
certain situation enhances the readability of the code.

>
>> c) For your solution to work correctly, outermost brackets would be required
>> while they are optional with my code. This is not important, but I thought I
>> mention it anyway. If my approach will be followed, the big advantage of
>> adding outer brackets is that the initial split_message_response() call
>> would
>> do a consistency check of the whole string before anything is processed.
> My approach can work just fine with an implicit top-level list, since
> that's what the code example that I gave in this mail does.

But now your code needs additional pa_read_list_start() and
pa_read_list_end() calls for non top level lists because they
are enclosed in brackets. pa_split_message_response() always
assumes it receives a list without outer brackets.

>
>> 3) To answer your original question, in your example it is a bit confusing
>> that there is a read_path() and a read_description() function. Both are
>> strings, so why have different functions? The functions we are adding
>> should not be too specialized.
> The idea behind read_path() is that if object paths have constraints on
> what is a valid value, then read_path() can do validation that plain
> read_string() can't.

That's exactly what I want to avoid and where I see the risk of
confusion. I do not want to have multiple functions with different
error checking for the same data type. I think the error checking
of the values does not belong into the parsing functions.

>
> I hope you understand my approach better now. I don't konw if that
> makes it any easier to accept it, though.
>
> There's another topic we should discuss: how should the writing API
> look like? This can have some effect on the parsing API as well. When
> writing data to the parameter string, the whole string can't be easily
> allocated all in one go, so we should have something like pa_strbuf
> available for clients. I don't think we should add pa_strbuf at least
> directly to the client API, because the message parameter writing has
> some specific conventions (i.e. wrapping everything in curly braces),
> and while it would be possible to add the specialized functions to the
> pa_strbuf API, I think that would be somewhat ugly. So I propose we
> introduce a new API similar to pa_strbuf.

I took the simple approach and added a function prepare_message_string()
which returns an escaped string. So whenever there is a risk that the
string contains any characters that need to be escaped, one can use
prepare_message_string(string) instead of the string itself. I think it does
not make sense to pass every value through the escaping, we only need
it for strings and only for those strings that have a risk of containing
unwanted characters. My pa_split_message_response() function then
removes the escaping when it reaches the innermost braces.

Adding the braces need not be done in a special function. On the contrary,
the caller should take care and keep track of the braces. It must do so
anyway if it is passing lists, so the additional overhead of adding one more
pair of brackets should be acceptable.

>
> My suggestion would be to introduce a new pa_message struct that has
> both reading and writing functions (e.g. pa_message_read_string(),
> pa_message_write_string()). It would be analogous to DBusMessage, so it
> would do a bit more than just string manipulation (it would at least
> carry the recipient/sender and message name information). I'm not sure
> you like that, because I've got the impression that you want to call
> only the "method calls" messages, and in that case replies and signals
> can not be called messages. Nevertheless, my preference is to call all
> of those three things messages, and in that case it makes sense to have
> pa_message that represents all three things, so that's what I'm going
> to propose here.

I agree that all three can be called messages so I am fine with
your terminology, but I do not see the need for the structure you
propose. I have been trying to keep the interface as simple and
transparent as possible and I don't think adding layers of abstraction
helps achieving that goal.

>
> pa_context_send_message() would take a pa_message as an argument. The
> reply callback would get a pa_message as an argument. Signal callbacks
> would get a pa_message as an argument. So things would work very
> similarly to how D-Bus works. One major difference to D-Bus would be
> that errors wouldn't be messages, at least for now (earlier there was
> some talk about making errors more informative, and at that point using
> pa_message for representing errors as well might make sense).

Did we not agree that the client should only receive a string as a reply?
If we can pass structures, we don't need all the overhead of transforming
the variables into strings.
Concerning messages and signals, they already receive the same set
of parameters, so again no need for some structure.

Let me add one more complication: What about passing floats
when server and client are using different locales? We probably
have to provide some float_to_string() function to produce results
independent onÂ  the locale. I have been thinking about the "," or "."
problem but could not come up with a good solution so far.
Also which precision should be used? Full possible number of digits?

Anyway, how are we going to proceed from here? I still think that
your proposal adds unnecessary complexity and is more difficult
to implement while you don't like my idea.
On the other hand I guess we both don't want to drop this patch
set only because we cannot agree on parsing. Do you see some
compromise? Should we ask Arun?