Re: git archive --format zip utf-8 issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I found a way to make unzip respect the UTF-8 flag in ZIP files: Apparently (from looking at the source) an extended field needs to be present in order for it to even look at general purpose flag 11. I sent a patch to add an extended timestamp field that fits the bill.

Here are new numbers on ZIP international filename compatibility:

		7-Zip	PeaZip	builtin	unzip	unzip	unzip	7z
		Windows	Windows	Windows	Linux	mingw	Windows	Linux
git	Linux	1	1	1	7	1	1	1
git 1	Linux	37	37	1	7	1	1	37
git 2	Linux	37	37	1	37	1	1	37
git 3	Linux	37	37	1	37	15	15	37
git	mingw	1	1	1	7	1	1	1
git 1	mingw	37	37	1	7	1	1	37
git 2	mingw	37	37	1	37	1	1	37
git 3	mingw	37	37	1	37	15	15	37
7-Zip	Windows	37	37	14	24	15	15	24
PeaZip	Windows	37	37	14	24	15	15	24
zip	Linux	37	37	1	37	15	15	37
zip	Windows	14	14	0	37	15	15	1
builtin	Windows	14	14	14	1	14	14	1

The test corpus still consists of 37 files based on the pangrams on the following web page:

	http://www.columbia.edu/~fdc/utf8/index.html#quickbrownfox

The files can be created using the attached script. It also provides a check command to count the files with correct names, and the results of that for different ZIP extractors are give in the table. The built-in ZIP functionality on Windows was only able to pack 14 of the 37 files, which explains the low score across the board for this packer.

"git 1" is the patch "archive-zip: support UTF-8 paths" added, which let's archive-zip make use of the UTF-8 flag. "git 2" is "git 1" plus the patch "archive-zip: declare creator to be Unix for UTF-8 paths". Both have been posted before. "git 3" is "git 1" plus the new patch "archive-zip: write extended timestamp".

Let's drop patch 2 (Unix as creator) and keep patches 1 (UTF-8 flag) and 3 (mtime field) to make archive-zip record non-ASCII filenames in a portable way. It's not perfect, but I don't know how to do any better given that Windows' built-in ZIP functionality expects filenames in the local code page and with an international audience for projects distributing ZIP files.

René

#!/bin/sh

files() {
cat <<EOF
pangrams/��よ�れ� ���らむ
pangrams/ã?†ã‚?ã?®ã?Šã??ã‚„ã?¾ã€€ã?‘ã?µã?“ã?ˆã?¦
pangrams/������� �り�るを
pangrams/ã?‚ã?•ã??ゆã‚?ã?¿ã?˜ã€€ã‚‘ã?²ã‚‚ã?›ã?š
pangrams/An ḃfuil do ċroí ag bualaḋ ó ḟaitíos an ġrá a �eall
pangrams/�rvíztűrő tükörfúrógép
pangrams/Blåbærsyltetøy
pangrams/D'ḟuascail �osa Úr�ac na hÓiġe Beannaiṫe pór
pangrams/d'œufs abîmés
pangrams/Éava agus �ḋai�
pangrams/EÄ¥oÅ?anÄ?o ĉiuĵaÅ­de
pangrams/El pingüino Wenceslao hizo kilómetros bajo exhaustiva
pangrams/Falsches Üben von Xylophonmusik quält
pangrams/Flygande bäckasiner söka strax hwila på mjuka tuvor
pangrams/Høj bly gom vandt fræk sexquiz på wc
pangrams/jeden größeren Zwerg
pangrams/lena ṗóg éada ó ṡlí do leasa ṫú
pangrams/Les naïfs ægithales hâtifs pondant à Noël où il gèle
pangrams/lluvia y frío añoraba a su querido cachorro
pangrams/na stĺpe sa Ä?ateľ uÄ?í kvákaÅ¥ novú ódu o živote
pangrams/O próximo vôo à noite sobre o Atlântico
pangrams/Pa's wijze lynx bezag vroom het fikse aquaduct
pangrams/Pchnąć w tę łódź jeża lub osiem skrzyń fig
pangrams/põe freqüentemente o único médico
pangrams/PříliÅ¡ žluÅ¥ouÄ?ký kůň úpÄ›l Ä?ábelské kódy
pangrams/Sævör grét áðan því úlpan var ónýt
pangrams/sont sûrs d'être déçus en voyant leurs drôles
pangrams/Starý kôň na hŕbe kníh žuje tíško povädnuté ruže
pangrams/The quick brown fox jumps over the lazy dog
pangrams/Törkylempijävongahdus
pangrams/Vuol Ruoŧa geÄ‘ggiid leat máŋga luosa ja Ä?uovžža
pangrams/×–×” ×›×™×£ סת×? לשמוע ×?יך תנצח קרפד ×¢×¥ טוב בגן
pangrams/ξεσκεπάζω την ψυχοφθόÏ?α βδελυγμία
pangrams/ξεσκεπάζω τὴν ψυχοφθόÏ?α βδελυγμία
pangrams/Жълтата дюлÑ? беше щаÑ?тлива
pangrams/Съешь же ещё Ñ?тих мÑ?гких французÑ?ких булок да выпей чаю
pangrams/че пухът, който цъфна, замръзна като гьон
EOF
}

case "$1" in
create)
	mkdir -p pangrams
	files | while read file
	do
		touch "$file"
	done
	;;
check)
	files | while read file
	do
		test -f "$file" && echo "$file"
	done | wc -l
	;;
*)
	echo "Usage: $0 create | check" >&2
	exit 1
	;;
esac

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]