Re: [PATCH spice-gtk 3/4] util: add unix2dos and dos2unix

Hans de Goede <hdegoede@xxxxxxxxxx> · Sat, 24 Aug 2013 15:07:04 +0200

Hi,

On 08/24/2013 02:56 PM, Marc-André Lureau wrote:

----- Mensaje original -----
Hi,

On 08/24/2013 02:32 PM, Marc-André Lureau wrote:

----- Mensaje original -----
Hi,

On 08/24/2013 02:17 PM, Marc-André Lureau wrote:

<snip>

+
+    if (!g_utf8_validate(str, len, NULL)) {
+        g_set_error_literal(error, G_CONVERT_ERROR,
+                            G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
+                            "Invalid byte sequence in conversion
input");
+        return -1;
+    }

And once you simply treat this as a regular C-string without worrying
about multi-byte encodings you can also drop this.

Actually, during implementation, I have encountered/produced invalid
utf8 that will break later on in gtk+, so I prefer to validate the
production.

Thinking more about this, if we want to do utf-8 validation, it should not
be done here, but rather in gtk/channel-main.c, since this code only gets
called in certain guest-line-end + direction cases, and if we want to do
utf-8 validation we should always do it.

Perhaps, although the difference is that here we do parse/modify the
string,
so it's important to check we don't produce garbage.

Right, but since garbage in = garbage out, you're not only checking that
the conversion code did not foo-bar, you're also validating the original
input,
at which point it makes sense to me to always do that even when not doing
conversion.

In one case, it's a pass-through, the caller and the destination are responsible for validation.

But here, we do parse and modify, so it's necessary to validate.

I am not stricly against validating all the time utf8, but I don't think it belongs to the messenger.

I agree that validation is best left up to the receiver, but in that case we should simply
never verify, as I suggested in the first place. line-ending conversion only inserts / removes
single-byte characters, and since these can never be part of a multi-byte character in UTF-8,
we cannot make the input any more (or less) broken then it was.

I really think we are doing ourselves a disservice by validating only when doing line-ending
conversion, since we will then likely get difficult to debug bugs, where we get non valid utf-8
in, and end up rejecting it only in some cases (while most receivers will likely accept it and
make the best out of it). Following the receiver should validate (and decide whether to outright
reject, or simply insert some ? chars or some such) reasoning to its logical conclusion,
we should simply never validate.

Regards,

Hans
_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/spice-devel