Re: [PATCH v4 7/8] update-ref: support multiple simultaneous updates

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 04 Sep 2013 14:27:31 -0700

Brad King <brad.king@xxxxxxxxxxx> writes:

> Nothing else uses LF NUL.  I chose it as a starting point for
> this very discussion, which I asked about in $gmane/233653.

The primary reason why LF raised my eyebrow was because the reason
why many subcommands use "-z" (and NUL) is often because the payload
may have LF in a record and LF cannot be used as a record separator
without escaping.  And they use NUL knowing that the payload data in
fields cannot contain a NUL.  If we used LF as a signal to define
the structure of the record, it pretty much defeats the whole point
of defining "-z" format.  The -m reason string will be made into a
single liner deep in the codepath but it _can_ contain LF.

I would have been more receptive to, say, double-NUL as a record
terminator, while using a NUL as a field terminator, or something,
but then we would need to have a way to express an empty field.

> In this particular use case we know the last field will never
> be LF but that may not be so for future cases.  There is no way
> to represent sentinel-terminated arbitrary variable-width records
> of NUL-terminated fields without some kind of escaping for the
> sentinel value, but the whole point of -z is to avoid escaping.

Indeed, but one escape hatch we have is that payload will not
contain NUL anywhere, so whenever we see a NUL, we can trust that it
defines the structure of the record, and is not a part of the
payload.

Stepping back a bit, here are some observations on the arguments
update-ref can take:

 * "-m <reason>" is a reason given for this entire update. As the
   point of this new feature is to give an all-or-none update to one
   or more refs, I think we should not have to accept more than one
   reason (more specifically, the -m option does _not_ belong to a
   specific record that describes what happens to a single ref).

 * "-d <ref> <oldvalue>" is a way to delete a ref. <oldvalue> may be
   missing.

 * "--no-deref <ref> <newvalue> <oldvalue>" and "<ref> <newvalue>
   <oldvalue>" are ways to update or create a ref. Again <oldvalue>
   may be missing.

So it looks to me that one possible format that is easy to generate
by machine without ambiguity may be:

    * The first record could be

      m NUL <reason strong> NUL

      but it is optional. The reason string may contain LF but just
      like invocation from the command line, LF will eventually
      cleaned up into a SP.

    * Then a series of records of different kinds follow.

      - A delete record looks like this:

        d NUL <ref> NUL <oldvalue> NUL

        If you want to delete the ref without "oldvalue" protection,
        just say

        d NUL <ref> NUL NUL

      - A create/update record looks like one of these:

        u NUL <ref> NUL <newvalue> NUL <oldvalue> NUL
        n NUL <ref> NUL <newvalue> NUL <oldvalue> NUL

        Again, if you want to delete the ref without "oldvalue"
        protection, just say

        u NUL <ref> NUL <newvalue> NUL NUL
        n NUL <ref> NUL <newvalue> NUL NUL

     * EOF signals the end of the request.

I am not saying the above is the best format, but the point is that
the mode of the operation defines the structure, so unlike parsing
xml or json where you first parse the structure and then interpret
what each element means, you can define a simple format where the
kind of element comes upfront to allow the parser/interpreter know
what is expected to follow it.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html