[PATCH 5/5] pactl: Implement list message-handlers

tanuk@xxxxxx (Tanu Kaskinen) · Sun, 21 Jan 2018 02:03:01 +0200

On Fri, 2018-01-19 at 14:23 +0100, Georg Chini wrote:
> On 18.01.2018 23:20, Tanu Kaskinen wrote:
> > On Wed, 2018-01-17 at 16:12 +0100, Georg Chini wrote:
> > > On 14.01.2018 21:47, Tanu Kaskinen wrote:ry from
> > > > the heap are quite heavy operations, and your approach will do a lot of
> > > > those operations. Even reading a simple integer will, I suppose,
> > > > involve first allocating a new string. I can't say for sure that the
> > > > performance penalty makes any practical difference, but my intuition
> > > > says that we should use the approach that I proposed that requires far
> > > > fewer heap memory allocations.
> > > 
> > > I can accept the argument that my solution unnecessarily allocates
> > > strings in some situations. Though I do not think it has much practical
> > > impact, most of it can be avoided by using special functions to retrieve
> > > numbers or booleans. I am also quite sure that a function similar to the
> > > *_split_in_place() functions can be implemented to completely avoid
> > > the allocation of unnecessary strings.
> > 
> > An in-place split function seems feasible indeed.
> > 
> > > > > > I already provided an example about how I'd like the parsing to work.
> > > > > > Apparently you didn't like that for some reason? I'll copy the example
> > > > > > here again:
> > > > > > 
> > > > > > const char *state = message_parameters;
> > > > > > 
> > > > > > if (pa_message_read_start_list(&state) < 0)
> > > > > >        fail("Expected a list.");
> > > > > > 
> > > > > > while (pa_message_read_end_list(&state) < 0) {
> > > > > >        char *path;
> > > > > >        char *description;
> > > > > > 
> > > > > >        if (pa_message_read_path(&state, &path) < 0)
> > > > > >            fail("Expected a path.");
> > > > > >        if (pa_message_read_string(&state, &description) < 0)
> > > > > >            fail("Expected a string.")
> > > > > > 
> > > > > >        /* Do something with path and description. */
> > > > > > 
> > > > > >        pa_xfree(description);
> > > > > >        pa_xfree(path);
> > > > > > }
> > > > > 
> > > > > As said above, I would prefer the single parsing function until
> > > > > the need for more complex functions arises. We should not be
> > > > > too fine grained, this will lead to confusion which function must
> > > > > be used when.
> > > > 
> > > > Can you elaborate, what risk of confusion do you see?
> > > 
> > >   From my point of view, your solution has several issues.
> > > 
> > > 1) The example above is not extensible. What is pa_message_read_end_list()
> > > supposed to do? Skip to the next element in the list indicated by { and
> > > return
> > > zero if } is encountered instead? If yes, this would not work if you add
> > > an item
> > > to the inner list, because at the next iteration it would go to the
> > > additional
> > > element instead of the next list. On the other hand, If the function is
> > > supposed
> > > to skip to the end of a list, you cannot use it to iterate over list
> > > elements.
> > 
> > pa_message_read_end_list() is supposed to consume the closing brace
> > (and any random characters, since those are now allowed), nothing else.
> > If it encounters an opening brace or end-of-string, it's supposed to
> > fail.
> > 
> > My example does not work with the current list-handlers message, that's
> > true.
> > 
> > > To
> > > make your solution extensible you would at least need three functions:
> > > 
> > > pa_message_find_list_start() - find the beginning of a list
> > > pa_message_find_list_end() - skip to the end of a list
> > > pa_message_find_list_element() - find the next element in a list
> > > 
> > > All three functions need to handle escaping, nesting and error checking.
> > 
> > I don't know why you added "find" to the function names. You're
> > probably guessing wrong the semantics I intended to have with
> > pa_message_list_start() and pa_message_list_end().
> > 
> > pa_message_read_list_start() should only consume the next opening brace
> > and any whitespace preceding it (by "whitespace" I mean any ignorable
> > characters). If it encounters a closing brace or end-of-string, it
> > should fail. There's no need for escaping or nesting handling.
> > 
> > pa_message_read_list_end() is explained above.
> > 
> > A "skip-to-end" function is indeed needed, and it will have to handle
> > escaping and nesting.
> > 
> > Here's an updated example that should work with the current list-
> > handlers (with an "implicit" top-level list):
> > 
> > while (pa_message_read_list_start(&state) > 0) {
> >      read path
> >      read description
> >      pa_message_skip_rest_of_list(&state) /* Consumes the list closing brace and everything preceding it. */
> > }
> > 
> > if (!pa_message_eof(&state)) {
> >      /* This can happen if an unexpected closing brace is encountered. */
> >      fail("Malformed data");
> > }
> > 
> > As you can see, the updated example is exactly the same length as your
> > proposal, except for the pa_message_eof() check that you didn't have,
> > but you should have a similar check, because otherwise you don't know
> > if the parsing loop ended because of end-of-string or malformed input.
> 
> I do not need the eof check because my function reads a complete
> element at a time and fails if there is any inconsistency. So your code
> is still much longer.
> How do you distinguish between the cases where the end of string
> is hit by pa_message_read_start() which is OK because then we
> finished processing and the case where the end of the string is
> hit by pa_message_skip_rest_of_list() which is not OK because it
> means a closing bracket is missing?

I didn't think this through properly earlier, but the functions should
return different values for end-of-string and malformed input. The
pa_message_eof() call should then be replaced with checking the
pa_message_read_list_start() return value (not that it makes much
difference in practice, but checking for errors indirectly via eof is
less clear than checking for errors directly).

> Also you need to handle the case where the end of a list does not
> match the end of the string because there are further elements
> behind it.

Why? There can't be any further elements behind the implicit list. As I
said earlier, the drawback with the implicit list is that the message
parameters aren't extensible. If the list is explicit, then it's
possible to detect when the list ends and other parameters begin.

> > 
> > > Your
> > > loop would look something like
> > > 
> > > pa_message_find_list_start(&state);
> > > while (pa_message_find_list_element(&state)) {
> > >        pa_message_find_list_start(&state);
> > >        read path
> > >        read description
> > >        pa_message_find_list_end(&state);
> > > }
> > > 
> > > This looks rather complicated compared to
> > > 
> > > while ((element = pa_split_message_response(response, &state))) {
> > >       read path
> > >       read description
> > >       pa_xfree(element);
> > > }
> > > 
> > > and I wonder if the additional overhead does not outweigh the allocation
> > > and de-allocation of a string. Additionally, your approach might easily lead
> > > to problems, because, if the find_list_end() call within the loop is
> > > forgotten,
> > > it will work with the original message response but will break on extension.
> > 
> > My updated example won't work with the original message response,
> > because pa_message_skip_rest_of_list() is required to consume the list
> > closing brace. If that is forgotten, then pa_message_read_list_start()
> > will fail, the loop will terminate, and the !pa_message_eof() will
> > trigger failure.
> > 
> > > 2) You are forced to process the elements in the order they are delivered,
> > > taking care of all the nesting. This has two disadvantages:
> > > 
> > > a) Because you do not know if the nesting brackets are closed correctly, you
> > > may continue processing elements after a missing bracket. Consider the
> > > following example: {{x1} {x2} {x3}} {y1} {y2}. If you forget to close the
> > > list like {{x1} {x2} {x3} {y1} {y2}, your approach would continue processing
> > > y1 and y2 as part of the list before the error is encountered.
> > > 
> > > split_message_response() however would complain before any of the elements
> > > is processed because it searches for the closing bracket and thereby does a
> > > consistency check of each element that is returned.
> > 
> > You're right, but I don't see why this is such a bad thing. The error
> > will be caught eventually.
> 
> If you only parse the data it is not an issue but if you are already 
> using the
> values in the parsing loop this might cause problems. Let me make a (not
> very realistic) example: We have a 5.1 output and the x1 to x3 are used
> to change the volume of the first three channels. Now you would also
> change the volume of the 4th and 5th channel to some random value
> before you hit an error.

Your approach doesn't actually help here. pa_split_message_response()
can't do much more than check that the braces match. It doesn't know
the types of the contained values, so the values can still be invalid
even if pa_split_message_response() thinks everything is fine.

> > 
> > > b) If you have nested lists, you need to handle them in nested loops
> > > like this:
> > > 
> > > pa_message_find_list_start(&state);
> > > while (pa_message_find_list_element(&state)) {
> > > 
> > >        pa_message_find_list_start(&state);
> > > while (pa_message_find_list_element(&state)) {
> > >              ...
> > >       }
> > > }
> > > 
> > > With my solution it would look like:
> > > 
> > > outer_list = pa_split_message_response(response, &state);
> > > 
> > > while ((inner_list = pa_split_message_response(outer_list, &state2))) {
> > >       ...
> > > }
> > > 
> > > which again is somewhat simpler and better to read especially if more
> > > complex structures are involved. As said above it has the additional
> > > advantage that outer_list is already checked for consistency before the
> > > while-loop is entered.
> > 
> > I don't understand. What data are we parsing? Why does the first
> > example need nested loops but the second doesn't? If the inner list is
> > an array that needs iteration, then you should loop in the code that
> > you replaced with "..." in the second example. If the inner list is a
> > structure like what we have to deal with list-handlers, then the first
> > example doesn't need the inner loop, it can read the struct members in
> > the outer loop.
> 
> Let's say I am parsing a list of heterogeneous elements where some
> elements contain sub-elements.
> 
> With my code, I can do something like
> 
> i = 0;
> while ((element[i] = pa_split_message_response(response, &state)))
>       i++;
> 
> parse element[3]
> parse element[5]
> 
> With your code you would have to do the parsing of the more complex
> elements within the loop. The ability to take things out of the loop in
> certain situation enhances the readability of the code.

It would certainly be useful to be able to skip uninteresting values.
There should be a function for this.

Your approach has the benefit that it's possible to reorder the parsing
of list elements, but I don't know if that's useful in practice. Maybe
it is, maybe it isn't.

> > 
> > > c) For your solution to work correctly, outermost brackets would be required
> > > while they are optional with my code. This is not important, but I thought I
> > > mention it anyway. If my approach will be followed, the big advantage of
> > > adding outer brackets is that the initial split_message_response() call
> > > would
> > > do a consistency check of the whole string before anything is processed.
> > 
> > My approach can work just fine with an implicit top-level list, since
> > that's what the code example that I gave in this mail does.
> 
> But now your code needs additional pa_read_list_start() and
> pa_read_list_end() calls for non top level lists because they
> are enclosed in brackets. pa_split_message_response() always
> assumes it receives a list without outer brackets.

What makes them "additional"? You start reading a list with
pa_split_message_response(), I start reading a list with
pa_message_read_list_start(). You have to free the list contents at the
end of the list, I have to call pa_message_read_end_list() or
pa_message_skip_rest_of_list().

That said, if pa_split_message_response() is changed to work in-place,
then you save the free call.

> > 
> > > 3) To answer your original question, in your example it is a bit confusing
> > > that there is a read_path() and a read_description() function. Both are
> > > strings, so why have different functions? The functions we are adding
> > > should not be too specialized.
> > 
> > The idea behind read_path() is that if object paths have constraints on
> > what is a valid value, then read_path() can do validation that plain
> > read_string() can't.
> 
> That's exactly what I want to avoid and where I see the risk of
> confusion. I do not want to have multiple functions with different
> error checking for the same data type. I think the error checking
> of the values does not belong into the parsing functions.

I find this point of view strange. The more the parsing functions do
error checking, the less the application needs to do error checking.
And in any case, the application is free to parse object paths as
strings if it doesn't care about strict path formatting.

In this particular case there's probably not that much benefit from
extra error checking, though. Applications aren't very likely to use
object paths in contexts where strict formatting is required. I'm fine
with not having a separate path parsing function. It can added later if
there's real need for it.

> > 
> > I hope you understand my approach better now. I don't konw if that
> > makes it any easier to accept it, though.
> > 
> > There's another topic we should discuss: how should the writing API
> > look like? This can have some effect on the parsing API as well. When
> > writing data to the parameter string, the whole string can't be easily
> > allocated all in one go, so we should have something like pa_strbuf
> > available for clients. I don't think we should add pa_strbuf at least
> > directly to the client API, because the message parameter writing has
> > some specific conventions (i.e. wrapping everything in curly braces),
> > and while it would be possible to add the specialized functions to the
> > pa_strbuf API, I think that would be somewhat ugly. So I propose we
> > introduce a new API similar to pa_strbuf.
> 
> I took the simple approach and added a function prepare_message_string()
> which returns an escaped string. So whenever there is a risk that the
> string contains any characters that need to be escaped, one can use
> prepare_message_string(string) instead of the string itself. I think it does
> not make sense to pass every value through the escaping, we only need
> it for strings and only for those strings that have a risk of containing
> unwanted characters. My pa_split_message_response() function then
> removes the escaping when it reaches the innermost braces.
> 
> Adding the braces need not be done in a special function. On the contrary,
> the caller should take care and keep track of the braces. It must do so
> anyway if it is passing lists, so the additional overhead of adding one more
> pair of brackets should be acceptable.

I think there should be convenience functions for writing strings,
floats and ints and whatever other data types we support. Manually
adding braces around the written values takes extra effort that can
easily be avoided (and writing floats correctly is not trivial anyway).

If you're not against such convenience functions, what namespace do you
suggest, if pa_message_write_string() etc. isn't good?

> > My suggestion would be to introduce a new pa_message struct that has
> > both reading and writing functions (e.g. pa_message_read_string(),
> > pa_message_write_string()). It would be analogous to DBusMessage, so it
> > would do a bit more than just string manipulation (it would at least
> > carry the recipient/sender and message name information). I'm not sure
> > you like that, because I've got the impression that you want to call
> > only the "method calls" messages, and in that case replies and signals
> > can not be called messages. Nevertheless, my preference is to call all
> > of those three things messages, and in that case it makes sense to have
> > pa_message that represents all three things, so that's what I'm going
> > to propose here.
> 
> I agree that all three can be called messages so I am fine with
> your terminology, but I do not see the need for the structure you
> propose. I have been trying to keep the interface as simple and
> transparent as possible and I don't think adding layers of abstraction
> helps achieving that goal.

Well, I think we need a bunch of functions for reading and writing
message parameters, and the functions (especially the writing
functions) are going to need some structure to keep state. pa_message
is my suggestion, but I'm open to other ideas.

> > 
> > pa_context_send_message() would take a pa_message as an argument. The
> > reply callback would get a pa_message as an argument. Signal callbacks
> > would get a pa_message as an argument. So things would work very
> > similarly to how D-Bus works. One major difference to D-Bus would be
> > that errors wouldn't be messages, at least for now (earlier there was
> > some talk about making errors more informative, and at that point using
> > pa_message for representing errors as well might make sense).
> 
> Did we not agree that the client should only receive a string as a reply?
> If we can pass structures, we don't need all the overhead of transforming
> the variables into strings.

Even if we use a pa_message struct, it's still good to have the
protocol string-based. For example, that makes "pactl send-message"
possible (or at least much easier to use than with a binary protocol).

> Concerning messages and signals, they already receive the same set
> of parameters, so again no need for some structure.
> 
> 
> Let me add one more complication: What about passing floats
> when server and client are using different locales? We probably
> have to provide some float_to_string() function to produce results
> independent on  the locale. I have been thinking about the "," or "."
> problem but could not come up with a good solution so far.
> Also which precision should be used? Full possible number of digits?

We should have a function for writing a float to the message
parameters. I don't know how exactly it should do the conversion (I
know it should use "." as the decimal point), probably we can borrow
from some existing standard... Whatever it does, it must never produce
a string that our float parsing function fails to parse.

> Anyway, how are we going to proceed from here? I still think that
> your proposal adds unnecessary complexity and is more difficult
> to implement while you don't like my idea.
> On the other hand I guess we both don't want to drop this patch
> set only because we cannot agree on parsing. Do you see some
> compromise? Should we ask Arun?

Since your splitting function can be modified to avoid mallocs, I'm not
so strongly against it any more. It looks like we're anyway going to
need a bunch of "helper" functions, so there's not that much difference
between our approaches, mainly list parsing details are affected
somewhat. How to organize the helper functions is an open question,
since you have so far opposed my pa_message idea, and haven't provided
a proper alternative for keeping parameter writing state.

Here's one more topic for discussion: could we please support null
values? One thing that has caused some trouble with D-Bus is that it
doesn't properly support null values. Generally nulls could be encoded
with "{}", but that doesn't allow distinguishing between an empty
string or list and null. Maybe that distinction is not needed?

How should reading and writing nulls work? Writing nulls is not
complicated, we could just have pa_message_write_null() (or since we
probably will have a "raw string append" function, it's trivial to just
append "{}" manually). Reading nulls is less obvious.

All reading functions could have an "is_null" parameter:

    double double_var;
    int is_null;
    ret = pa_message_read_double(msg, &double_var, &is_null);

If the is_null parameter isn't provided (i.e. is itself NULL), the
function could fail if it encounters null.

Alternatively, if it's too ugly to have the is_null parameter in all
reading functions, there could be separate "_or_null()" functions
alongside the regular functions, e.g. pa_message_read_double_or_null().

And still one more topic: we might want to support arbitrary binary
data at some point. This could be supported by escaping non-ascii bytes
with \xnn (with n being a hexadecimal number). This escaping support
should exist from the beginning. Do we perhaps want to support \n, \r
and \t as well?

-- 
Tanu

https://liberapay.com/tanuk
https://www.patreon.com/tanuk