[PATCH 6/8] message-params: Allow parameter strings to contain escaped curly braces

georg@xxxxxxxx (Georg Chini) · Mon, 30 Jul 2018 13:23:55 +0200

On 30.07.2018 11:18, Tanu Kaskinen wrote:
> On Sun, 2018-07-29 at 22:36 +0200, Georg Chini wrote:
>> On 29.07.2018 21:47, Tanu Kaskinen wrote:
>>> On Fri, 2018-07-27 at 10:51 +0200, Georg Chini wrote:
>>>> On 27.07.2018 10:08, Tanu Kaskinen wrote:
>>>>> On Thu, 2018-07-26 at 18:02 +0200, Georg Chini wrote:
>>>>>> On 26.07.2018 12:37, Tanu Kaskinen wrote:
>>>>>>> On Sun, 2018-07-22 at 21:11 +0200, Georg Chini wrote:
>>>>>>>> On 22.07.2018 17:48, Tanu Kaskinen wrote:
>>>>>>>>> On Sun, 2018-07-22 at 16:02 +0200, Georg Chini wrote:
>>>>>>>>>> On 21.07.2018 20:17, Tanu Kaskinen wrote:
>>>>>>>>>>> On Mon, 2018-04-09 at 19:35 +0200, Georg Chini wrote:
>>>>>>>>>>>> +
>>>>>>>>>>>> +/* Read functions */
>>>>>>>>>>>> +
>>>>>>>>>>>>        /* Split the specified string into elements. An element is defined as
>>>>>>>>>>>>         * a sub-string between curly braces. The function is needed to parse
>>>>>>>>>>>>         * the parameters of messages. Each time it is called returns the position
>>>>>>>>>>>>         * of the current element in result and the state pointer is advanced to
>>>>>>>>>>>> - * the next list element.
>>>>>>>>>>>> + * the next list element. On return, the parameter *is_unpacked indicates
>>>>>>>>>>>> + * if the string is plain text or contains a sub-list. is_unpacked may
>>>>>>>>>>>> + * be NULL.
>>>>>>>>>>> is_unpacked looks like unnecessary complexity.
>>>>>>>>>>> pa_message_params_read_string() should always unescape the value.
>>>>>>>>>>
>>>> This is not about parameter type, it is just to distinguish between
>>>> a list and a simple value. One example comes to my mind immediately:
>>>> Consider a parameter list that consists of strings and a couple of
>>>> arrays. Now you can read this list as an array of strings (patch 8
>>>> provides a function for that) and then examine those strings that
>>>> contain arrays separately. Having the flag (and using it in read_string())
>>>> provides a more flexible and convenient way to parse a parameter list.
>>>>
>>>> The flag may not be strictly necessary at the moment, but I would still
>>>> like to keep it.
>>> To continue a familiar theme of my review: if there's a
>>> read_string_array() function, I want that to be used only for string
>>> arrays, not a mishmash of random types. There could be a separate
>>> function split_list_into_array() that does something similar to
>>> what you wanted to do with read_string_array().
>>> split_list_into_array() wouldn't do special string handling,
>>> though, so unescaping would be left to the application. I find that
>>> only logical, since other basic types don't get special handling
>>> either (i.e. floats aren't converted to C floats).
>>>
>>> Your use case could be served with a vararg function that instead of
>>> producing a string array would read into separate variables, like this:
>>>
>>> pa_message_params_read(c, state,
>>>                          PA_TYPE_STRING, &param1,
>>>                          PA_TYPE_FLOAT, &param2,
>>>                          PA_TYPE_RAW, &param3,
>>>                          0);
>>>
>>> PA_TYPE_RAW would be useful for reading a list (or anything else) into
>>> a C string without unescaping or other processing. There could be also
>>> PA_TYPE_IGNORE for uninteresting variables, and PA_TYPE_*_ARRAY for
>>> arrays of basic types.
>>>
>>> (Now it occurred to me that the 'c' argument name that is used in the
>>> parsing functions is a bit weird and unhelpful. Maybe "param_list"
>>> would be good?)
>>>
>> I still don't see your point. Again, the use of is_unpacked is
>> transparent for the
>> user of read_string(), so the user just reads a string without having to
>> care about
>> unescaping. The flag does not complicate the API, it simplifies it
>> because the
>> unescaping is done automatically.
> The flag complicates the split_list() function parameters, but also the
> read_string() semantics: you need to explain to the user that unlike
> all the other read_foo() functions, read_string() can read any value
> and unescaping becomes conditional because of this.

read_string() is not supposed to read any value, it is only supposed
to read strings. Actually, the user does not need to know anything
about escaping, because read_string() and write_string() do the
necessary escaping/unescaping completely transparent for the user.

The is_unpacked flag is at least useful to check if something returned
by split_list() is really a simple type and not a structure or array. I am
currently not using it in the read_foo() functions, but I think I should.

>
>> Your approach seems unnecessary
>> complicated to me. A string is a string, even if it contains
>> sub-structures. Your
>> split_list_into_array() function would basically return an array of
>> strings, so what
>> would be the advantage? It would only make parsing more cumbersome, because
>> the task of unescaping is given to the user instead of doing it
>> automatically where
>> necessary.
> That's why I came up with the vararg function. I agree that
> split_list_into_array() is unlikely to be very useful.
>
>> Also there is no "mishmash of random types", they are all strings It is
>> only the
>> difference between a "simple" string which needs no further processing and a
>> "complex" string which needs further parsing.
> The API defines types for the parameters. There are no separate
> "simple" and "complex" strings in the type system. The string type is
> different than the list type. What you call complex strings are in my
> mind just the raw unprocessed serialized data, which is a different
> abstraction level than the values that the various read_foo() functions
> return. It feels just very wrong to have a function that returns values
> in both domains: unparsed raw data and parsed values. Especially when
> the function is a family of read_foo() functions where all other
> functions only operate on their designated types.

Yes, that is some additional functionality that the read_string()
function provides on top of reading what I called "simple strings"
above. But that does not hinder the function to work exactly
like expected when reading a plain string. Maybe a better name
for the function would be read_string_element(), indicating that
it can read either a plain string or an element of the parameter
list as a string.
I could split this into two functions, read_string() which always
does unescaping and read_element() which would be a wrapper
around split_list() and would never do unescaping. In both functions,
the is_unpacked flag can be useful to check if the input matches
the type "plain string" or "serialized data".

>
> Compare to D-Bus: if you say that "they are all strings", then in D-Bus
> the same would be "they are all byte arrays", since that's what they
> are in the message buffer. I doubt you would suggest that the
> dbus_message_iter_read_byte_array() function (if such thing existed)
> should be usable on arbitrarily typed values rather than just byte
> array values, and should return the raw serialized data regardless of
> its type. Here you're suggesting that read_string() should be able to
> return the raw serialized data regardless of its type.
>
> Do you understand what I'm getting at? You may disagree that having
> strict separation of concerns between raw serialized data and parsed
> values is beneficial for the user-friendliness of the API, but if you
> don't even understand what I mean by that separation, then discussing
> this is very hard.
>
I do understand your reasoning and technically we surely could drop
the flag. I just don't want it because for me using read_string() is a
more convenient way of parsing a parameter list than using split_list()
directly. It's just a simple way to say "let's skip this array or structure
for the moment and store it somewhere for later inspection". So I can
do something like

x = read_float()
y = read_int()
my_structure = read_string()
z = read_bool()
a = read_string()

and then examine my_structure at a later time. Using split_list() instead
of read_string() is somehow breaking the flow for me. We could do the
same with the read_element() function I suggested above (and then
maybe make split_list() a purely internal function?)