On Tue, Oct 22, 2013 at 9:13 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Antoine Pelisse <apelisse@xxxxxxxxx> writes: > >> git-fast-import documentation says that paths can be C-style quoted. >> Unfortunately, the current remote-hg helper doesn't unquote quoted >> path and pass them as-is to Mercurial when the commit is created. >> >> This result in the following situation: >> >> - clone a mercurial repository with git >> - Add a file with space: `mkdir dir/foo\ bar` Note to myself, mkdir doesn't create a "file" >> - Commit that new file, and push the change to mercurial >> - The mercurial repository as now a new directory named '"dir', which >> contains a file named 'foo bar"' >> >> Use python ast.literal_eval to unquote the string if it starts with ". >> It has been tested with quotes, spaces, and utf-8 encoded file-names. >> >> Signed-off-by: Antoine Pelisse <apelisse@xxxxxxxxx> >> --- > > A path you read in fast-import input indeed needs to be unquoted > when it begins with a dq, and I _think_ by using ast.literal_eval(), > you probably can correctly unquote any valid C-quoted string. > > But it bothers me somewhat that what the patch does seems to be > overly broad. Doesn't ast.literal_eval() take a lot more than just > strings? Good point > ast.literal_eval(node_or_string) > > Safely evaluate an expression node or a Unicode or Latin-1 > encoded string containing a Python expression. The string or > node provided may only consist of the following Python literal > structures: strings, numbers, tuples, lists, dicts, booleans, > and None. Fortunately, I don't believe any of the other type can start with a dq. So currently, I don't believe we can end-up with anything else but a string. We could certainly check that this is always true though. > Also doesn't Python's double-quoted string have a lot more magic > than C-quoted string, e.g. > > $ python -i > >>> import ast > >>> not_cq_path = '"abc" "def"' > >>> ast.literal_eval(not_cq_path) > 'abcdef' It is true that I have expected "valid output" from git-fast-export. And I don't have in mind any easy solution to detect that the output is broken, yet still accepted as a valid string by python. We could obviously write a unquote_c_style() equivalent in python if needed. Thanks, -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html