Re: [PATCH 1/6] fast-import: tighten parsing of paths

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(Sorry for first sending as HTML)

On Mar 28, 2024, at 01:21, Patrick Steinhardt <ps@xxxxxx> wrote:
> 
> So this is part of the "filemodify" section with the following syntax:
> 
>    'M' SP <mode> SP <dataref> SP <path> LF
> 
> The way I interpret this change is that <path> could previously be empty
> (`SP LF`), but now it needs to be quoted (`SP '"' '"' LF). This seems to
> be related to cases (1) and (3) of your commit messages, where
> "filemodify" could contain an unquoted empty string whereas "filecopy"
> and "filerename" would complain about such an unquoted string.
> 
> In any case I don't see a strong argument why exactly it should be
> forbidden to have an unquoted empty path here, and I do wonder whether
> it would break existing writers of the format when we retroactively
> tighten this case. Isn't it possible to instead loosen it such that all
> three of the above actions know to handle unquoted empty paths?

At first, I strongly thought that we should forbid this case of unquoted empty
paths. It's a somewhat peculiar case in that it refers to the root of the repo
and few front-ends use it. I surveyed git fast-export, git-filter-repo,
Reposurgeon, hg-fast-export, cvs-fast-export (by Eric S. Raymond),
cvs-fast-export (by Roger Vaughn), svn2git, bzr-fastexport, and bk fast-export,
but none of them ever target the root of the repo. I assumed that if an unquoted
empty path was ever emitted, it was likely an bug that should not be accepted
(e.g., a null byte array somehow).

However, most occurrences of <path> in the grammar have allowed unquoted empty
strings to mean the root for 14 years and documentation has implied that it's
allowed for 13 years. It's just the two cases of the destination paths of
filecopy and filerename that don't allow it, and those are less-used operations,
so front-ends may never encounter that error.

Some assumed errors in emitting empty paths are caught by validation with file
modes, so even if it's loosened it's still fairly safe. filemodify must be
040000 when it targets the root, and filecopy and filerename to the root must
have a source path that's a directory. The worst case is filedelete
unintentionally targeting the root, but that's been allowed to be an unquoted
empty path for almost its entire lifetime, so I don't think should be changed.

I've changed it to allow unquoted empty paths for all operations in patch v2
3/8 (fast-import: allow unquoted empty path for root).

> This is loosening the condition so that we also accept unquoted paths
> now. Okay.

On the contrary, v1 tightens all paths to forbid unquoted empty strings. v2 now
allows it and should make that change more clear.

> On Fri, Mar 22, 2024 at 12:03:18AM +0000, Thalia Archibald wrote:
>> +/*
>> + * Parse the path string into the strbuf. It may be quoted with escape sequences
>> + * or unquoted without escape sequences. When unquoted, it may only contain a
>> + * space if `allow_spaces` is nonzero.
>> + */
>> +static void parse_path(struct strbuf *sb, const char *p, const char **endp, int allow_spaces, const char *field)
>> +{
>> + strbuf_reset(sb);
>> + if (*p == '"') {
>> + if (unquote_c_style(sb, p, endp))
>> + die("Invalid %s: %s", field, command_buf.buf);
>> + } else {
>> + if (allow_spaces)
>> + *endp = p + strlen(p);
> 
> I wonder whether `stop_at_unquoted_space` might be more telling. It's
> not like we disallow spaces here, it's that we treat them as the
> separator to the next field.

I agree, but I’d rather something shorter, so I’ve changed it to `include_spaces`.

>> + else
>> + *endp = strchr(p, ' ');
>> + if (*endp == p)
>> + die("Missing %s: %s", field, command_buf.buf);
> 
> Error messages should start with a lower-case letter and be
> translateable. But these are simply moved over from the previous code,
> so I don't mind much if you want to keep them as-is.
> 
>> + strbuf_add(sb, p, *endp - p);
>> + }
>> +}


fast-import isn’t a porcelain command, AFAIK, so I thought the convention is
that its output isn't translated?

>From po/README.md:
> 
> The output from Git's plumbing utilities will primarily be read by
> programs and would break scripts under non-C locales if it was
> translated. Plumbing strings should not be translated, since
> they're part of Git's API.


I would, however, like to improve its error messages. For example, diagnosing
errors more precisely or changing the outdated “GIT” to “Git”.

To what extent should the exact error messages be considered part of Git's API?

Thalia





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux