Hi,
I found a way to make unzip respect the UTF-8 flag in ZIP files:
Apparently (from looking at the source) an extended field needs to be
present in order for it to even look at general purpose flag 11. I sent
a patch to add an extended timestamp field that fits the bill.
Here are new numbers on ZIP international filename compatibility:
7-Zip PeaZip builtin unzip unzip unzip 7z
Windows Windows Windows Linux mingw Windows Linux
git Linux 1 1 1 7 1 1 1
git 1 Linux 37 37 1 7 1 1 37
git 2 Linux 37 37 1 37 1 1 37
git 3 Linux 37 37 1 37 15 15 37
git mingw 1 1 1 7 1 1 1
git 1 mingw 37 37 1 7 1 1 37
git 2 mingw 37 37 1 37 1 1 37
git 3 mingw 37 37 1 37 15 15 37
7-Zip Windows 37 37 14 24 15 15 24
PeaZip Windows 37 37 14 24 15 15 24
zip Linux 37 37 1 37 15 15 37
zip Windows 14 14 0 37 15 15 1
builtin Windows 14 14 14 1 14 14 1
The test corpus still consists of 37 files based on the pangrams on the
following web page:
http://www.columbia.edu/~fdc/utf8/index.html#quickbrownfox
The files can be created using the attached script. It also provides a
check command to count the files with correct names, and the results of
that for different ZIP extractors are give in the table. The built-in
ZIP functionality on Windows was only able to pack 14 of the 37 files,
which explains the low score across the board for this packer.
"git 1" is the patch "archive-zip: support UTF-8 paths" added, which
let's archive-zip make use of the UTF-8 flag. "git 2" is "git 1" plus
the patch "archive-zip: declare creator to be Unix for UTF-8 paths".
Both have been posted before. "git 3" is "git 1" plus the new patch
"archive-zip: write extended timestamp".
Let's drop patch 2 (Unix as creator) and keep patches 1 (UTF-8 flag) and
3 (mtime field) to make archive-zip record non-ASCII filenames in a
portable way. It's not perfect, but I don't know how to do any better
given that Windows' built-in ZIP functionality expects filenames in the
local code page and with an international audience for projects
distributing ZIP files.
René
#!/bin/sh
files() {
cat <<EOF
pangrams/ã‚?ã?Œã‚ˆã?Ÿã‚Œã?žã€€ã?¤ã?ã?ªã‚‰ã‚€
pangrams/ã?†ã‚?ã?®ã?Šã??ã‚„ã?¾ã€€ã?‘ã?µã?“ã?ˆã?¦
pangrams/������� �り�るを
pangrams/ã?‚ã?•ã??ゆã‚?ã?¿ã?˜ã€€ã‚‘ã?²ã‚‚ã?›ã?š
pangrams/An ḃfuil do Ä‹roà ag bualaḋ ó ḟaitÃos an Ä¡rá a á¹?eall
pangrams/Ã?rvÃztűrÅ‘ tükörfúrógép
pangrams/Blåbærsyltetøy
pangrams/D'ḟuascail �osa Úr�ac na hÓiġe Beannaiṫe pór
pangrams/d'œufs abîmés
pangrams/Éava agus �ḋai�
pangrams/EÄ¥oÅ?anÄ?o ĉiuĵaÅde
pangrams/El pingüino Wenceslao hizo kilómetros bajo exhaustiva
pangrams/Falsches Üben von Xylophonmusik quält
pangrams/Flygande bäckasiner söka strax hwila på mjuka tuvor
pangrams/Høj bly gom vandt fræk sexquiz på wc
pangrams/jeden größeren Zwerg
pangrams/lena ṗóg éada ó ṡlà do leasa ṫú
pangrams/Les naïfs ægithales hâtifs pondant à Noël où il gèle
pangrams/lluvia y frÃo añoraba a su querido cachorro
pangrams/na stĺpe sa Ä?ateľ uÄ?à kvákaÅ¥ novú ódu o živote
pangrams/O próximo vôo à noite sobre o Atlântico
pangrams/Pa's wijze lynx bezag vroom het fikse aquaduct
pangrams/Pchnąć w tę łódź jeża lub osiem skrzyń fig
pangrams/põe freqüentemente o único médico
pangrams/PÅ™ÃliÅ¡ žluÅ¥ouÄ?ký kůň úpÄ›l Ä?ábelské kódy
pangrams/Sævör grét áðan þvà úlpan var ónýt
pangrams/sont sûrs d'être déçus en voyant leurs drôles
pangrams/Starý kôň na hÅ•be knÃh žuje tÃÅ¡ko povädnuté ruže
pangrams/The quick brown fox jumps over the lazy dog
pangrams/Törkylempijävongahdus
pangrams/Vuol Ruoŧa geÄ‘ggiid leat máŋga luosa ja Ä?uovžža
pangrams/×–×” ×›×™×£ סת×? לשמוע ×?יך ×ª× ×¦×— קרפד ×¢×¥ טוב בגן
pangrams/ξεσκεπάζω την ψυχοφθόÏ?α βδελυγμία
pangrams/ξεσκεπάζω τὴν ψυχοφθόÏ?α βδελυγμία
pangrams/Жълтата дюлÑ? беше щаÑ?тлива
pangrams/Съешь же ещё Ñ?тих мÑ?гких французÑ?ких булок да выпей чаю
pangrams/че пухът, който цъфна, замръзна като гьон
EOF
}
case "$1" in
create)
mkdir -p pangrams
files | while read file
do
touch "$file"
done
;;
check)
files | while read file
do
test -f "$file" && echo "$file"
done | wc -l
;;
*)
echo "Usage: $0 create | check" >&2
exit 1
;;
esac