On Wed, Oct 13, 2010 at 11:06:44AM +0200, Matthias Bolte wrote: > VMware uses a mix of percent-, pipe- and base64-encoding in > different combinations in different places. > > Add a testcase for this. > --- > src/esx/README | 25 ++++ > src/esx/esx_driver.c | 72 ++++++----- > src/esx/esx_storage_driver.c | 42 ++++++- > src/esx/esx_util.c | 198 ++++++++++++++++++++++++++++++ > src/esx/esx_util.h | 18 +++ > src/esx/esx_vi.c | 6 + > src/esx/esx_vmx.c | 88 +++++--------- > tests/esxutilstest.c | 51 ++++++++ > tests/xml2vmxdata/xml2vmx-annotation.vmx | 2 +- > 9 files changed, 405 insertions(+), 97 deletions(-) That sounds vaguely familiar, I think I reviewed such a patch last month, right ? [...] > > - virBufferURIEncodeString(&buffer, def->name); > + escapedName = esxUtil_EscapeDatastoreItem(def->name); > + okay, we really need to make sure the escaping is XML compatible though > @@ -621,3 +622,200 @@ esxUtil_ReformatUuid(const char *input, char *output) [...] > +char * > +esxUtil_EscapeDatastoreItem(const char *string) > +{ > + char *replaced = strdup(string); > + char *escaped1; > + char *escaped2 = NULL; > + > + if (replaced == NULL) { > + virReportOOMError(); > + return NULL; > + } > + > + esxUtil_ReplaceSpecialWindowsPathChars(replaced); > + > + escaped1 = esxUtil_EscapeHexPercent(replaced); > + > + if (escaped1 == NULL) { > + goto cleanup; > + } > + > + escaped2 = esxUtil_EscapeBase64(escaped1); > + Okay none of those may lead to a problem for XML ... [...] > --- a/tests/esxutilstest.c > +++ b/tests/esxutilstest.c > @@ -227,6 +227,56 @@ testConvertDateTimeToCalendarTime(const void *data ATTRIBUTE_UNUSED) > > > > +struct testDatastoreItem { > + const char *string; > + const char *escaped; > +}; > + > +static struct testDatastoreItem datastoreItems[] = { > + { "normal", "normal" }, > + { "AÃ1Ã2Ã3Ã4#5~6!7Â8/9%Z", > + "A+w6Q-1+w7Y-2+w7w-3+w58-4+Iw-5+fg-6+IQ-7+wqc-8+JQ-2f9+JQ-25Z" }, > + { "Z~6!7Â8/9%0#1\"2'3`4&A", > + "Z+fg-6+IQ-7+wqc-8+JQ-2f9+JQ-250+Iw-1_2'3+YA-4+Jg-A" }, > + { "ææè", "+5qiZ5rqW6Kqe" }, > + { "!\"#$%&'()*+,-./0123456789:;<=>?", > + "+IQ-_+IyQl-25+Jg-'()_+Kw-,-.+JQ-2f0123456789_+Ow-_+PQ-__" }, > + { "A Z[\\]^_B", "A Z+WyU-5c+XV4-_B" }, > + { "A`B@{|}~DEL", "A+YA-B+QHs-_+fX4-DEL" }, > + { "hÃÃÃÃÃÃH", "h+w4DDgcOCw4PDhMOF-H" }, > + { "AìZ", "A+7L+A-Z" }, > + { "!ìA", "+Iey,gA-A" }, > + { "~~~", "+fn5+" }, > + { "~~~A", "+fn5+-A" }, > + { "K%U/H\\Z", "K+JQ-25U+JQ-2fH+JQ-5cZ" }, > +}; Ouch :-) One question for the C purists. How do we know how characters outside of the ASCII range may be interpreted by a compiler ? The mail as received is made of one mime part, not multipart, it's defined as Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 so my mail reader probably assumes that this is utf8 characters, carried as a base64 encoded chunk of ASCII. Since my terminal is displaying in UTF-8 I apparently see your patch as it's expected, but as soon as it's saved on disk, most Unix/Linux tools will interpret those character using the locale of the user, i.e. if I were to compile this in a terminal using a non-UTF-8 locale, the compiler may very well interpret this slightly differently, okay this is data so maybe it's just bytes being copied without risk, but I'm still a bit worried there. I like the idea of testing this API seriously, but at the C level shouldn't we encode those in an ASCII way to be sure ? Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@xxxxxxxxxxxx | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list