Re: [PATCH v2 3/5] Extend nwfilter schema to accept comment attributes

Stefan Berger <stefanb@xxxxxxxxxx> · Tue, 28 Sep 2010 16:06:14 -0400

Eric Blake <eblake@xxxxxxxxxx> wrote on 09/28/2010
03:26:48 PM:

> [image removed] 

> 

> Re:  [PATCH v2 3/5] Extend nwfilter schema to accept 

> comment attributes

> 

> Eric Blake 

> 

> to:

> 

> Stefan Berger

> 

> 09/28/2010 03:27 PM

> 

> Cc:

> 

> libvir-list

> 

> On 09/28/2010 04:28 AM, Stefan Berger wrote:

> >> okay.  It also leaves out 8-bit bytes - could that be
a problem for i18n

> >

> >> where people want comments with native-language accented
characters?

> >> That is, are we being too strict here?  Maybe a better
pattern would be

> >> to reject specific non-printing ASCII bytes we want to avoid,
assuing

> >> you can use escape sequences like [^\001]?

> >

> > Looking at

> >

> > http://www.asciitable.com/

> >

> > I should probably include 0x20-0x7E and 128-175, 224-238 - maybe
even

> > more? So the regex then becomes

> >

> > [&#x20;-&#x7E;&#128;-&#175;&#224;-&#238;]{0,256}

> 

> True ASCII is strictly 7-bit; any locale where isprint() returns true
on 

> 8-bit bytes is a superset single-byte encoding, such as ISO-8859-1,
or 

> 'extended ascii' from the URL you posted above.  But I'm also
thinking 

> about multi-byte encodings, like UTF-8, where we cannot a priori write
a 

> regex that will accept all valid Unicode printable characters, in
part 

> because you have to look at more than one byte at a time to determine
if 

> you have a printable character.  Which goes back to my suggestion
of an 

> inverse charset - rejecting bytes that are known to be non-printable

> ASCII, and letting everything else whether or not it is is a printable

> byte sequence in the current locale.  So what about this idea:
exclude 

> control characters except for tab, and let space and everything after

> through (I don't know if it needs to be adjusted to also reject &#x00):

> 

> [^&#x01;-&#x08&#x0A-&#x1F]{0,256}

Fine by me. We may just give the impression of accepting
unicode while the code does not handle it.

   Stefan

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list