On Wed, Mar 14, 2012 at 10:40:41AM -0600, Eric Blake wrote: > On 03/14/2012 09:19 AM, Daniel P. Berrange wrote: > > From: "Daniel P. Berrange" <berrange@xxxxxxxxxx> > > > > To facilitate parsing of argv[] style strings, provide a > > virStrSplitQuoted API which will split a string on the listed > > separators, but also allow for quoting with ' or ". > > > > * src/libvirt_private.syms, src/util/util.c, > > src/util/util.h: Implement virStrSplitQuoted > > * tests/utiltest.c: Some tests for virStrSplitQuoted > > --- > > src/libvirt_private.syms | 1 + > > src/util/util.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++ > > src/util/util.h | 2 + > > tests/utiltest.c | 89 ++++++++++++++++++++++++++++++++-- > > 4 files changed, 203 insertions(+), 6 deletions(-) > > This looks like it is repeating some of the code in > virsh.c:vshCommandStringGetArg; any chance we can combine them? In > particular, the ability to mimic shell handling of \ escapes, as well as > the difference in behavior of \ inside "" vs. '', seems like it will > come in handy. Ah I had forgotten about that code. Can you clarify the difference in \ handling. I guess you mean that inside '', \ can only be used for \\ and \', while inside "", it can do all the standard shell escapes like \t, \n, etc ? > > > > > + > > +static char * > > +virStrDupUnescape(const char *src, size_t len) > > +{ > > + char *ret; > > + size_t i, j; > > + bool escape = false; > > + > > + if (VIR_ALLOC_N(ret, len + 1) < 0) > > + return NULL; > > + > > + for (i = 0, j = 0 ; i < len ; i++) { > > + if (escape) { > > + escape = false; > > + ret[j++] = src[i]; > > + } else if (src[i] == '\\') { > > + escape = true; > > + } else { > > + ret[j++] = src[i]; > > + } > > So this version only strips backslash. Looks okay in isolation. > > > +/** > > + * virStrSplitQuoted: > > + * @src: string to split > > + * @sep: list of separator characters > > + * > > + * Split the string 'src' into chunks, separated by one or more of > > + * the characters in 'sep'. Allow ' or " to quote strings even if > > + * they contain 'sep' > > No documentation of backslash handling. Are we trying to emulate shell > parsing here (where depending on outer, "", or '', \ behaves differently)? > > Or are we trying to emulate C string parsing, where \n is translated to > newline? > > What you have here does neither; although I didn't spot any flaw in the > code, I don't know if it's the algorithm we want to be using. I should have sent this paired with my other patch for <cmdline> handling in LXC. That is the intended use case for this function. I'm not sure that anyone has ever clearly defined what escaping syntax is used for /proc/cmdline (which is what <cmdline> is representing. SystemD's parser is what I modelled my code on, though it in fact does not unescape anything. eg. if parsing foo "bar \" wizz" eek systemd seems to return foo bar \" wizz eek while I return foo bar " wizz eek > > +struct StringSplitData { > > + const char *src; > > + const char *sep; > > + const char **bits; > > +}; > > A point in your favor, for at least testing what you parse! If we > change our mind to mimic shell or C parsing, then we'd have to update > these tests. Yes, escaping rules blow my mind unless I can test them :-) > > + const char *bits1[] = { "foo", "bar", NULL }; > > + DO_TEST_STRING("foo bar", " ", bits1); > > + DO_TEST_STRING("foo 'bar'", " ", bits1); > > + DO_TEST_STRING("foo \"bar\"", " ", bits1); > > + DO_TEST_STRING(" foo \"bar\"", " ", bits1); > > + DO_TEST_STRING(" foo \"bar\"", " ", bits1); > > + DO_TEST_STRING(" foo \"bar\"\n ", " \t\n\r", bits1); > > + > > + const char *bits2[] = { "foo", "bar wizz", "eek", NULL }; > > + DO_TEST_STRING("foo 'bar wizz' eek", " ", bits2); > > + DO_TEST_STRING("foo \"bar wizz\" eek", " ", bits2); > > What about "foo bar\ wizz eek", if \ can escape outside of quotes? Hmm, possibly > > + > > + const char *bits3[] = { "foo", "'bar' \"wizz\"", "eek", NULL }; > > It would also be nice to have a literal backslash in the expected > results, to prove that we can properly escape them. Good point. > Overall, I like the idea of the new function, but I'm worried that > introducing yet another parser could hurt us (users will be asking "now > which escape style is in effect here, and how does it differ from > standardized escape styles that I'm used to?"). I think that perhaps we should have virStrSplitQuoted just return the split pieces, with *no* unescaping. And then have separate functions to escape/unescape individual string pieces after the fact. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list