Re: [PATCH v4 1/8] fast-import: tighten path unquoting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Apr 12, 2024, at 09:34, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Thalia Archibald <thalia@xxxxxxxxxxxxx> writes:
>> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
>> index 782bda007c..ce9231afe6 100644
>> --- a/builtin/fast-import.c
>> +++ b/builtin/fast-import.c
>> @@ -2258,10 +2258,56 @@ static uintmax_t parse_mark_ref_space(const char **p)
>> return mark;
>> }
>> 
>> +/*
>> + * Parse the path string into the strbuf. It may be quoted with escape sequences
>> + * or unquoted without escape sequences. When unquoted, it may only contain a
>> + * space if `include_spaces` is nonzero.
>> + */
> 
> It took me three reads to understand the last sentence.  It would
> have been easier if it were written as "it may contain a space only
> if ...".  I'd also named it "allow_unquoted_spaces"---it is not like
> this function includes extra spaces on top of whatever was given.

Patrick commented on this earlier too:

> On Mar 28, 2024, at 01:21, Patrick Steinhardt <ps@xxxxxx> wrote:
>> 
>> On Fri, Mar 22, 2024 at 12:03:18AM +0000, Thalia Archibald wrote:
>>> +/*
>>> + * Parse the path string into the strbuf. It may be quoted with escape sequences
>>> + * or unquoted without escape sequences. When unquoted, it may only contain a
>>> + * space if `allow_spaces` is nonzero.
>>> + */
>>> +static void parse_path(struct strbuf *sb, const char *p, const char **endp, int allow_spaces, const char *field)
>>> +{
>>> + strbuf_reset(sb);
>>> + if (*p == '"') {
>>> + if (unquote_c_style(sb, p, endp))
>>> + die("Invalid %s: %s", field, command_buf.buf);
>>> + } else {
>>> + if (allow_spaces)
>>> + *endp = p + strlen(p);
>> 
>> I wonder whether `stop_at_unquoted_space` might be more telling. It's
>> not like we disallow spaces here, it's that we treat them as the
>> separator to the next field.
> 
> I agree, but I’d rather something shorter, so I’ve changed it to `include_spaces`.

With all that in mind, I think Patrick is right that the best way to
think of this is that space functions as a field separator, conditional
on this flag. In practice, that leads to restrictions on whether you
can write paths that contain spaces without quotes.

As to naming, `allow_spaces` and `include_spaces` are problematic for
the reasons you both have pointed out. I think `stop_at_unquoted_space`
is problematic, because that’s not where it stops when quoted, but
rather at the close quote. I think that `include_unquoted_spaces` is
good, because it describes that spaces are included in this field when
it is an unquoted string. `allow_unquoted_spaces` implies that its an
error to have a space, but no such error is raised here.

How’s this change? I’ve reworded the relevant sentence and specified any
“it”s and replaced the “when unquoted, …” qualifier with “unquoted
strings may …” to reduce ambiguity.

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index fd23a00150..2070c78c56 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -2259,12 +2259,13 @@ static uintmax_t parse_mark_ref_space(const char **p)
}

/*
- * Parse the path string into the strbuf. It may be quoted with escape sequences
- * or unquoted without escape sequences. When unquoted, it may only contain a
- * space if `include_spaces` is nonzero.
+ * Parse the path string into the strbuf. The path can either be quoted with
+ * escape sequences or unquoted without escape sequences. Unquoted strings may
+ * contain spaces only if `include_unquoted_spaces` is nonzero; otherwise, it
+ * stops parsing at the first space.
 */
static void parse_path(struct strbuf *sb, const char *p, const char **endp,
-		int include_spaces, const char *field)
+		int include_unquoted_spaces, const char *field)
{
	if (*p == '"') {
		if (unquote_c_style(sb, p, endp))
@@ -2272,7 +2273,7 @@ static void parse_path(struct strbuf *sb, const char *p, const char **endp,
		if (strlen(sb->buf) != sb->len)
			die("NUL in %s: %s", field, command_buf.buf);
	} else {
-		if (include_spaces)
+		if (include_unquoted_spaces)
			*endp = p + strlen(p);
		else
			*endp = strchrnul(p, ' ');
@@ -2282,7 +2283,7 @@ static void parse_path(struct strbuf *sb, const char *p, const char **endp,

/*
 * Parse the path string into the strbuf, and complain if this is not the end of
- * the string. It may contain spaces even when unquoted.
+ * the string. Unquoted strings may contain spaces.
 */
static void parse_path_eol(struct strbuf *sb, const char *p, const char *field)
{
@@ -2295,7 +2296,7 @@ static void parse_path_eol(struct strbuf *sb, const char *p, const char *field)

/*
 * Parse the path string into the strbuf, and ensure it is followed by a space.
- * It may not contain spaces when unquoted. Update *endp to point to the first
+ * Unquoted strings may not contain spaces. Update *endp to point to the first
 * character after the space.
 */
static void parse_path_space(struct strbuf *sb, const char *p,


Thalia





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux