[PATCH v4 0/8] fast-import: tighten parsing of paths

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> fast-import has subtle differences in how it parses file paths between each
> occurrence of <path> in the grammar. Many errors are suppressed or not checked,
> which could lead to silent data corruption. A particularly bad case is when a
> front-end sent escapes that Git doesn't recognize (e.g., hex escapes are not
> supported), it would be treated as literal bytes instead of a quoted string.
>
> Bring path parsing into line with the documented behavior and improve
> documentation to fill in missing details.

Updated to address Junio's review comments. Thanks!

This round needed no code changes, so is probably ready, if the documentation
changes look good.

Changes since v3:
* Reword documentation paragraphs on path quoting.
* Fix minor typo in commit message.

Thalia


Thalia Archibald (8):
  fast-import: tighten path unquoting
  fast-import: directly use strbufs for paths
  fast-import: allow unquoted empty path for root
  fast-import: remove dead strbuf
  fast-import: improve documentation for path quoting
  fast-import: document C-style escapes for paths
  fast-import: forbid escaped NUL in paths
  fast-import: make comments more precise

 Documentation/git-fast-import.txt |  31 +-
 builtin/fast-import.c             | 158 ++++----
 t/t9300-fast-import.sh            | 624 +++++++++++++++++++++---------
 3 files changed, 551 insertions(+), 262 deletions(-)

Range-diff against v3:
1:  d9ab0c6a75 = 1:  d6ea8aca46 fast-import: tighten path unquoting
2:  696ca27bb7 = 2:  9499f34aae fast-import: directly use strbufs for paths
3:  39879d0a66 ! 3:  9b1e6b80f5 fast-import: allow unquoted empty path for root
    @@ Commit message
         also be an empty string (`""`) to specify the root of the tree” to “The
         root of the tree can be represented by an empty string as `<path>`”.
     
    -    Thus, we can assume that some front-ends have depended on this behavior.
    +    Thus, we should assume that some front-ends have depended on this
    +    behavior.
     
         Remove this restriction for the destination paths of filecopy and
         filerename and change tests targeting the root to test `""` and ``.
4:  1cef05e59a = 4:  1a2b0dc616 fast-import: remove dead strbuf
5:  2e78690023 ! 5:  fb0d870d53 fast-import: improve documentation for unquoted paths
    @@ Metadata
     Author: Thalia Archibald <thalia@xxxxxxxxxxxxx>
     
      ## Commit message ##
    -    fast-import: improve documentation for unquoted paths
    +    fast-import: improve documentation for path quoting
     
    -    It describes what cannot be in an unquoted path, but not what it is.
    -    Reframe it as a definition of unquoted paths. The requirement that it
    -    not start with `"` is the core element that implies the rest.
    +    It describes what characters cannot be in an unquoted path, but not
    +    their semantics. Reframe it as a definition of unquoted paths. From the
    +    perspective of the parser, whether it starts with `"` is what defines
    +    whether it will parse it as quoted or unquoted.
    +
    +    The restrictions on characters in unquoted paths (with starting-", LF,
    +    and spaces) are explained in the quoted paragraph. Move it to the
    +    unquoted paragraph and reword.
     
         The restriction that the source paths of filecopy and filerename cannot
         contain SP is only stated in their respective sections. Restate it in
    @@ Documentation/git-fast-import.txt: in octal.  Git only supports the following mo
     -A `<path>` string must use UNIX-style directory separators (forward
     -slash `/`), may contain any byte other than `LF`, and must not
     -start with double quote (`"`).
    -+A `<path>` can be written as unquoted bytes or a C-style quoted string:
    ++A `<path>` can be written as unquoted bytes or a C-style quoted string.
      
     -A path can use C-style string quoting; this is accepted in all cases
     -and mandatory if the filename starts with double quote or contains
     -`LF`. In C-style quoting, the complete name should be surrounded with
    -+When a `<path>` does not start with double quote (`"`), it is an
    +-double quotes, and any `LF`, backslash, or double quote characters
    +-must be escaped by preceding them with a backslash (e.g.,
    +-`"path/with\n, \\ and \" in it"`).
    ++When a `<path>` does not start with a double quote (`"`), it is an
     +unquoted string and is parsed as literal bytes without any escape
     +sequences. However, if the filename contains `LF` or starts with double
    -+quote, it must be written as a quoted string. Additionally, the source
    -+`<path>` in `filecopy` or `filerename` must be quoted if it contains SP.
    -+
    -+A `<path>` can use C-style string quoting; this is accepted in all cases
    -+and mandatory in the cases where the filename cannot be represented as
    -+an unquoted string. In C-style quoting, the complete name should be surrounded with
    - double quotes, and any `LF`, backslash, or double quote characters
    - must be escaped by preceding them with a backslash (e.g.,
    - `"path/with\n, \\ and \" in it"`).
    ++quote, it cannot be represented as an unquoted string and must be
    ++quoted. Additionally, the source `<path>` in `filecopy` or `filerename`
    ++must be quoted if it contains SP.
      
     -The value of `<path>` must be in canonical form. That is it must not:
    ++When a `<path>` starts with a double quote (`"`), it is a C-style quoted
    ++string, where the complete filename is enclosed in a pair of double
    ++quotes and escape sequences are used. Certain characters must be escaped
    ++by preceding them with a backslash: `LF` is written as `\n`, backslash
    ++as `\\`, and double quote as `\"`. All filenames can be represented as
    ++quoted strings.
    ++
     +A `<path>` must use UNIX-style directory separators (forward slash `/`)
    -+and must be in canonical form. That is it must not:
    ++and its value must be in canonical form. That is it must not:
      
      * contain an empty directory component (e.g. `foo//bar` is invalid),
      * end with a directory separator (e.g. `foo/` is invalid),
6:  1b07ddffe0 ! 6:  4b6017ded8 fast-import: document C-style escapes for paths
    @@ Commit message
         Signed-off-by: Thalia Archibald <thalia@xxxxxxxxxxxxx>
     
      ## Documentation/git-fast-import.txt ##
    -@@ Documentation/git-fast-import.txt: quote, it must be written as a quoted string. Additionally, the source
    - 
    - A `<path>` can use C-style string quoting; this is accepted in all cases
    - and mandatory in the cases where the filename cannot be represented as
    --an unquoted string. In C-style quoting, the complete name should be surrounded with
    --double quotes, and any `LF`, backslash, or double quote characters
    --must be escaped by preceding them with a backslash (e.g.,
    --`"path/with\n, \\ and \" in it"`).
    -+an unquoted string. In C-style quoting, the complete filename is
    -+surrounded with double quote (`"`) and certain characters must be
    -+escaped by preceding them with a backslash: `LF` is written as `\n`,
    -+backslash as `\\`, and double quote as `\"`. Some characters may may
    -+optionally be written with escape sequences: `\a` for bell, `\b` for
    -+backspace, `\f` for form feed, `\n` for line feed, `\r` for carriage
    -+return, `\t` for horizontal tab, and `\v` for vertical tab. Any byte can
    -+be written with 3-digit octal codes (e.g., `\033`).
    +@@ Documentation/git-fast-import.txt: When a `<path>` starts with a double quote (`"`), it is a C-style quoted
    + string, where the complete filename is enclosed in a pair of double
    + quotes and escape sequences are used. Certain characters must be escaped
    + by preceding them with a backslash: `LF` is written as `\n`, backslash
    +-as `\\`, and double quote as `\"`. All filenames can be represented as
    ++as `\\`, and double quote as `\"`. Some characters may optionally be
    ++written with escape sequences: `\a` for bell, `\b` for backspace, `\f`
    ++for form feed, `\n` for line feed, `\r` for carriage return, `\t` for
    ++horizontal tab, and `\v` for vertical tab. Any byte can be written with
    ++3-digit octal codes (e.g., `\033`). All filenames can be represented as
    + quoted strings.
      
      A `<path>` must use UNIX-style directory separators (forward slash `/`)
    - and must be in canonical form. That is it must not:
     
      ## t/t9300-fast-import.sh ##
     @@ t/t9300-fast-import.sh: test_path_eol_success () {
7:  dc67464b6a = 7:  5b464f4b01 fast-import: forbid escaped NUL in paths
8:  5e02d887bc = 8:  6eb66fce45 fast-import: make comments more precise
-- 
2.44.0






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux