On Mon, Mar 30, 2015 at 10:56:11AM -0600, Eric Blake wrote: > On 03/30/2015 09:50 AM, Daniel Veillard wrote: > > >> NACK. Stripping control codes from a volume name represents the wrong > >> name. We need to escape the problematic bytes, rather than strip them. > > > > you can't escape them with a CharRef for sure > > > > http://www.w3.org/TR/REC-xml/#wf-Legalchar > > Characters referred to using character references must match the > > production for Char. > > > > That time Ján is right :-) > > Ouch. Then how do we represent the name of a storage volume, when the > file system allows arbitrary bytes including control characters, in the > volume name, but where we are restricted to only using valid XML? Do we > just silently ignore such files as impossible volumes that libvirt > cannot manage? (I'd rather omit such a volume from the list in the > pool, than silently munge its name into something incorrect) Since if such an invalid CharRef were to hit libxml2 you would get a parser error and no result. So you can safely assume nobody ever has experienced those. Then you can try to push an additional patch doing a libvirt escaping but of only those problematic characters prior to the encoding in the XML. Then escape them back when reading from the XML to libvirt internals. This should not affect any deployed instance since they would be unparseable if that was the case. I would suggest using the same charref escaping but before passing to XML, e.g. real path: /foo\3bar libvirt encoded: /foobar XML encoded: /foobar you also need to catch & and give him special status real path: /foo&bar libvirt encoded: /foo&bar XML encoded: /foo&bar after libvirt parsing you end up with /foobar and each time you see &#numericsequence; you translate that to the equivalent UTF-8 character. Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] As a first approach, I would suggest just detecting bytes 1-8 0xB-0x1F and giving them the treatment, the probability of hitting surrogates in UTF-8 filesnames seems low enough that the patch should work in general. Whether using /foobar vs. /foo�x3;bar is a matter of taste you only need to handle one IMHO. Add a little regression tests with all the lower caracter and & use in the path and I think you're covered. Sounds too late for 1.2.14 though, Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veillard@xxxxxxxxxx | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list