[RFC 0/8] Antique UTF-8 filename support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Robin Rosenberg <robin.rosenberg@xxxxxxxxx>

Since there is some interest in the topic, now, I can republish my old 2 ½ year old
patches so there is some real code to comment on. They apply on top of
6dcfa306f2b67b733a7eb2d7ded1bc9987809edb, For completness I send
all patches, but the interesing stuff is in patch 4 and 5. Beware of encoding
issues with the test cases.

They do not handle Windows UTF-16 at all, but I think that is just a matter of writing
windows specifc wrappers for the filename and directory handling routines.

Feel free to rewamp and steal ideas and add constructive criticism. Don't even 
think of cherry-picking and rebasing, It's careful handpicking with copy/paste at 
best, but mostly it's fuel for discussions.

I'd admit some parts are quite kludgy and probably slow. as I was primarily 
interested to see if it was even feasible, which it was. however there was simply
no interest, which meant there was no point in optimizing it. It was simply the
wrong problem at the time.

Disclaimer: A problem with this approach is that, although it does character
conversion, if you are on a non-UTF-8 locale it will not let you mange
any repository. That is basically impossible and hence not the goal. It does
help people with the same (or close) languages to cooperate without enforcing
a common encoding as long as stick to the common characters, i.e. the ones
that can be converted between the locales involved.

This is probably the most out-dated patch series ever. 

-- robin

Robin Rosenberg (8):
(mostly obsolete)
  UTF helpers
  Messages in locale.
  Extend tests to cover locale wrt to commit messages.

The interesing stuff (patch 4 & 5)
  UTF file names.
  Extend all tests to work on UTF-8 filenames.

old wip
  test of utf_locallinks
  Convert symlink dest in diff
  UTF-8 in non-SHA1-objects

 Makefile                            |    8 +-
 builtin-add.c                       |    5 +-
 builtin-cat-file.c                  |    6 +-
 builtin-checkout-index.c            |   46 +++-
 builtin-commit-tree.c               |    9 +-
 builtin-ls-files.c                  |   26 ++-
 builtin-ls-tree.c                   |   16 +-
 builtin-rev-parse.c                 |    7 +-
 builtin-update-index.c              |   18 +-
 builtin-write-tree.c                |    5 +-
 diff.c                              |  111 ++++++--
 dir.c                               |   22 +-
 git-commit.sh                       |    5 +
 git-compat-util.h                   |   43 +++
 git-rebase.sh                       |    1 +
 git.c                               |    9 +
 log-tree.c                          |    4 +-
 merge-index.c                       |   25 ++-
 read-cache.c                        |    8 +-
 refs.c                              |   11 +-
 setup.c                             |   28 ++-
 t/lib-read-tree-m-3way.sh           |   38 ++--
 t/t-utf-filenames.sh                |   95 +++++++
 t/t-utf-msg.sh                      |   43 +++
 t/t0000-basic.sh                    |  117 ++++----
 t/t0010-racy-git.sh                 |   10 +-
 t/t1000-read-tree-m-3way.sh         |  240 +++++++++---------
 t/t1001-read-tree-m-2way.sh         |   56 ++--
 t/t1020-subdirectory.sh             |   63 +++---
 t/t1100-commit-tree-options.sh      |   12 +-
 t/t1400-update-ref.sh               |   10 +-
 t/t2000-checkout-cache-clash.sh     |   18 +-
 t/t2001-checkout-cache-clash.sh     |   30 +-
 t/t2002-checkout-cache-u.sh         |    8 +-
 t/t2003-checkout-cache-mkdir.sh     |  118 ++++----
 t/t2004-checkout-cache-temp.sh      |  144 +++++-----
 t/t2100-update-cache-badpath.sh     |   48 ++--
 t/t2101-update-index-reupdate.sh    |   56 ++--
 t/t3000-ls-files-others.sh          |   36 ++--
 t/t3002-ls-files-dashpath.sh        |   24 +-
 t/t3010-ls-files-killed-modified.sh |  104 ++++----
 t/t3020-ls-files-error-unmatch.sh   |   10 +-
 t/t3100-ls-tree-restrict.sh         |  122 +++++-----
 t/t3101-ls-tree-dirname.sh          |   88 +++---
 t/t3400-rebase.sh                   |   18 +-
 t/t3401-rebase-partial.sh           |   24 +-
 t/t3402-rebase-merge.sh             |   17 +-
 t/t3403-rebase-skip.sh              |   10 +-
 t/t3500-cherry.sh                   |   26 +-
 t/t3600-rm.sh                       |   28 +-
 t/t3700-add.sh                      |   30 +-
 t/t4000-diff-format.sh              |   26 +-
 t/t4001-diff-rename.sh              |   20 +-
 t/t4002-diff-basic.sh               |  160 ++++++------
 t/t4003-diff-rename-1.sh            |   66 +++---
 t/t4004-diff-rename-symlink.sh      |   40 ++--
 t/t4005-diff-rename-2.sh            |   54 ++--
 t/t4006-diff-mode.sh                |   14 +-
 t/t4008-diff-break-rewrite.sh       |  100 ++++----
 t/t4009-diff-rename-4.sh            |   63 +++---
 t/t4011-diff-symlink.sh             |   38 ++--
 t/t4012-diff-binary.sh              |   16 +-
 t/t7301-rev-parse.sh                |   20 ++
 t/test-lib.sh                       |   13 +-
 test-utf.c                          |   61 +++++
 utf.c                               |  501 +++++++++++++++++++++++++++++++++++
 utf.h                               |   27 ++
 67 files changed, 2133 insertions(+), 1142 deletions(-)
 create mode 100755 t/t-utf-filenames.sh
 create mode 100755 t/t-utf-msg.sh
 create mode 100755 t/t7301-rev-parse.sh
 create mode 100644 test-utf.c
 create mode 100644 utf.c
 create mode 100644 utf.h

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]