(Sorry for first sending as HTML) On Mar 28, 2024, at 01:21, Patrick Steinhardt <ps@xxxxxx> wrote: > > So this is part of the "filemodify" section with the following syntax: > > 'M' SP <mode> SP <dataref> SP <path> LF > > The way I interpret this change is that <path> could previously be empty > (`SP LF`), but now it needs to be quoted (`SP '"' '"' LF). This seems to > be related to cases (1) and (3) of your commit messages, where > "filemodify" could contain an unquoted empty string whereas "filecopy" > and "filerename" would complain about such an unquoted string. > > In any case I don't see a strong argument why exactly it should be > forbidden to have an unquoted empty path here, and I do wonder whether > it would break existing writers of the format when we retroactively > tighten this case. Isn't it possible to instead loosen it such that all > three of the above actions know to handle unquoted empty paths? At first, I strongly thought that we should forbid this case of unquoted empty paths. It's a somewhat peculiar case in that it refers to the root of the repo and few front-ends use it. I surveyed git fast-export, git-filter-repo, Reposurgeon, hg-fast-export, cvs-fast-export (by Eric S. Raymond), cvs-fast-export (by Roger Vaughn), svn2git, bzr-fastexport, and bk fast-export, but none of them ever target the root of the repo. I assumed that if an unquoted empty path was ever emitted, it was likely an bug that should not be accepted (e.g., a null byte array somehow). However, most occurrences of <path> in the grammar have allowed unquoted empty strings to mean the root for 14 years and documentation has implied that it's allowed for 13 years. It's just the two cases of the destination paths of filecopy and filerename that don't allow it, and those are less-used operations, so front-ends may never encounter that error. Some assumed errors in emitting empty paths are caught by validation with file modes, so even if it's loosened it's still fairly safe. filemodify must be 040000 when it targets the root, and filecopy and filerename to the root must have a source path that's a directory. The worst case is filedelete unintentionally targeting the root, but that's been allowed to be an unquoted empty path for almost its entire lifetime, so I don't think should be changed. I've changed it to allow unquoted empty paths for all operations in patch v2 3/8 (fast-import: allow unquoted empty path for root). > This is loosening the condition so that we also accept unquoted paths > now. Okay. On the contrary, v1 tightens all paths to forbid unquoted empty strings. v2 now allows it and should make that change more clear. > On Fri, Mar 22, 2024 at 12:03:18AM +0000, Thalia Archibald wrote: >> +/* >> + * Parse the path string into the strbuf. It may be quoted with escape sequences >> + * or unquoted without escape sequences. When unquoted, it may only contain a >> + * space if `allow_spaces` is nonzero. >> + */ >> +static void parse_path(struct strbuf *sb, const char *p, const char **endp, int allow_spaces, const char *field) >> +{ >> + strbuf_reset(sb); >> + if (*p == '"') { >> + if (unquote_c_style(sb, p, endp)) >> + die("Invalid %s: %s", field, command_buf.buf); >> + } else { >> + if (allow_spaces) >> + *endp = p + strlen(p); > > I wonder whether `stop_at_unquoted_space` might be more telling. It's > not like we disallow spaces here, it's that we treat them as the > separator to the next field. I agree, but I’d rather something shorter, so I’ve changed it to `include_spaces`. >> + else >> + *endp = strchr(p, ' '); >> + if (*endp == p) >> + die("Missing %s: %s", field, command_buf.buf); > > Error messages should start with a lower-case letter and be > translateable. But these are simply moved over from the previous code, > so I don't mind much if you want to keep them as-is. > >> + strbuf_add(sb, p, *endp - p); >> + } >> +} fast-import isn’t a porcelain command, AFAIK, so I thought the convention is that its output isn't translated? >From po/README.md: > > The output from Git's plumbing utilities will primarily be read by > programs and would break scripts under non-C locales if it was > translated. Plumbing strings should not be translated, since > they're part of Git's API. I would, however, like to improve its error messages. For example, diagnosing errors more precisely or changing the outdated “GIT” to “Git”. To what extent should the exact error messages be considered part of Git's API? Thalia