On Mon, May 23, 2022 at 4:07 PM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > > On Mon, May 23 2022, Jiang Xin wrote: > > > From: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> > > > > Before commit fc0fd5b23b (Makefile: help gettext tools to cope with our > > custom PRItime format, 2017-07-20) we'd consider source files as-is > > with gettext, but because we need to understand PRItime in the same way > > that gettext itself understands PRIuMAX we'd first check if we had a > > clean checkout, then munge all of the processed files in-place with > > "sed", generate "po/git.pot", and then finally "reset --hard" to undo > > our changes. > > > > By generating "pot" snippets in ".build/pot/po" for each source file > > and rewriting certain source files with PRItime macros to temporary > > files in ".build/pot/po", we can avoid running "make pot" by altering > > files in place and doing a "reset --hard" afterwards. > > > > This speed of "make pot" is slower than before on an initial run, > > because we run "xgettext" many times (once per source file), but it > > can be boosted by parallelization. It is *much* faster for incremental > > runs, and will allow us to implement related targets in subsequent > > commits. > > > > When the "pot" target was originally added in cd5513a7168 (i18n: > > Makefile: "pot" target to extract messages marked for translation, > > 2011-02-22) it behaved like a "normal" target. I.e. we'd skip the > > re-generation of the po/git.pot if nothing had to be done. > > > > Then after po/git.pot was checked in in dce37b66fb0 (l10n: initial > > git.pot for 1.7.10 upcoming release, 2012-02-13) the target was broken > > until 1f31963e921 (i18n: treat "make pot" as an explicitly-invoked > > target, 2014-08-22) when it was made to depend on "FORCE". I.e. the > > Makefile's dependency resolution inherently can't handle incremental > > building when the target file may be updated by git (or something else > > external to "make"). But this case no longer applies, so FORCE is no > > longer needed. > > > > That out of the way, the main logic change here is getting rid of the > > "reset --hard": > > > > We'll generate intermediate ".build/pot/po/%.po" files from "%", which > > is handy to see at a glance what strings (if any) in a given file are > > marked for translation: > > > > $ make .build/pot/po/pretty.c.po > > [...] > > $ cat .build/pot/po/pretty.c.po > > #: pretty.c:1051 > > msgid "unable to parse --pretty format" > > msgstr "" > > $ > > > > For these C source files which contain the PRItime macros, we will > > create temporary munged "*.c" files in a tree in ".build/pot/po" > > corresponding to our source tree, and have "xgettext" consider those. > > The rule needs to be careful to "(cd .build/pot/po && ...)", because > > otherwise the comments in the po/git.pot file wouldn't refer to the > > correct source locations (they'd be prefixed with ".build/pot/po"). > > These temporary munged "*.c” files will be removed immediately after > > the corresponding po files are generated, because some development tools > > cannot ignore the duplicate source files in the ".build" directory > > according to the ".gitignore" file, and that may cause trouble. > > > > The output of the generated po/git.pot file is changed in one minor > > way: Because we're using msgcat(1) instead of xgettext(1) to > > concatenate the output we'll now disambiguate where "TRANSLATORS" > > comments come from, in cases where a message is the same in N files, > > and either only one has a "TRANSLATORS" comment, or they're > > different. E.g. for the "Your edited hunk[...]" message we'll now > > apply this change (comment content elided): > > > > +#. #-#-#-#-# add-patch.c.po #-#-#-#-# > > #. TRANSLATORS: do not translate [y/n] > > [...] > > +#. #-#-#-#-# git-add--interactive.perl.po #-#-#-#-# > > #. TRANSLATORS: do not translate [y/n] > > [...] > > #: add-patch.c:1253 git-add--interactive.perl:1244 > > msgid "" > > "Your edited hunk does not apply. Edit again (saying \"no\" discards!) [y/n]? " > > msgstr "" > > > > There are six such changes, and they all make the context more > > understandable, as msgcat(1) is better at handling these edge cases > > than xgettext(1)'s previously used "--join-existing" flag. > > > > But filenames in the above disambiguation lines of extracted-comments > > have an extra ".po" extension compared to the filenames at the file > > locations. While we could rename the intermediate ".build/pot/po/%.po" > > files without the ".po" extension to use more intuitive filenames in > > the disambiguation lines of extracted-comments, but that will confuse > > developer tools with lots of invalid C or other source files in > > ".build/pot/po" directory. > > > > The addition of "--omit-header" option for xgettext makes the "pot" > > snippets in ".build/pot/po/*.po" smaller. But as we'll see in a > > subsequent commit this header behavior has been hiding an > > encoding-related bug from us, so let's carry it forward instead of > > re-generating it with xgettext(1). > > > > The "po/git.pot" file should have a header entry, because a proper > > header entry will increase the speed of creating a new po file using > > msginit and set a proper "POT-Creation-Date:" field in the header > > entry of a "po/XX.po" file. We use xgettext to generate a separate > > header file at ".build/pot/git.header" from "/dev/null", and use this > > header to assemble "po/git.pot". > > > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> > > Signed-off-by: Jiang Xin <zhiyou.jx@xxxxxxxxxxxxxxx> > > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > > --- > > .gitignore | 1 + > > Makefile | 80 ++++++++++++++++++++++++++++++++++++++++-------------- > > 2 files changed, 60 insertions(+), 21 deletions(-) > > > > diff --git a/.gitignore b/.gitignore > > index e81de1063a..a452215764 100644 > > --- a/.gitignore > > +++ b/.gitignore > > @@ -200,6 +200,7 @@ > > *.[aos] > > *.o.json > > *.py[co] > > +.build/ > > .depend/ > > *.gcda > > *.gcno > > diff --git a/Makefile b/Makefile > > index 46914dcd80..1962999c18 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -569,6 +569,7 @@ INSTALL = install > > TCL_PATH = tclsh > > TCLTK_PATH = wish > > XGETTEXT = xgettext > > +MSGCAT = msgcat > > MSGFMT = msgfmt > > CURL_CONFIG = curl-config > > GCOV = gcov > > @@ -2706,6 +2707,7 @@ XGETTEXT_FLAGS = \ > > --force-po \ > > --add-comments=TRANSLATORS: \ > > --msgid-bugs-address="Git Mailing List <git@xxxxxxxxxxxxxxx>" \ > > + --package-name=Git \ > > --sort-by-file \ > > --from-code=UTF-8 > > XGETTEXT_FLAGS_C = $(XGETTEXT_FLAGS) --language=C \ > > @@ -2714,6 +2716,7 @@ XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \ > > --keyword=gettextln --keyword=eval_gettextln > > XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --language=Perl \ > > --keyword=__ --keyword=N__ --keyword="__n:1,2" > > +MSGCAT_FLAGS = --sort-by-file > > LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \ > > $(GENERATED_H) > > LOCALIZED_SH = $(SCRIPT_SH) > > @@ -2726,34 +2729,68 @@ LOCALIZED_SH += t/t0200/test.sh > > LOCALIZED_PERL += t/t0200/test.perl > > endif > > > > -## Note that this is meant to be run only by the localization coordinator > > -## under a very controlled condition, i.e. (1) it is to be run in a > > -## Git repository (not a tarball extract), (2) any local modifications > > -## will be lost. > > +## We generate intermediate .build/pot/po/%.po files containing a > > +## extract of the translations we find in each file in the source > > +## tree. We will assemble them using msgcat to create the final > > +## "po/git.pot" file. > > +LOCALIZED_ALL_GEN_PO = > > + > > +LOCALIZED_C_GEN_PO = $(LOCALIZED_C:%=.build/pot/po/%.po) > > +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_C_GEN_PO) > > + > > +LOCALIZED_SH_GEN_PO = $(LOCALIZED_SH:%=.build/pot/po/%.po) > > +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_SH_GEN_PO) > > + > > +LOCALIZED_PERL_GEN_PO = $(LOCALIZED_PERL:%=.build/pot/po/%.po) > > +LOCALIZED_ALL_GEN_PO += $(LOCALIZED_PERL_GEN_PO) > > + > > ## Gettext tools cannot work with our own custom PRItime type, so > > ## we replace PRItime with PRIuMAX. We need to update this to > > ## PRIdMAX if we switch to a signed type later. > > +$(LOCALIZED_C_GEN_PO): .build/pot/po/%.po: % > > + $(call mkdir_p_parent_template) > > + $(QUIET_XGETTEXT) \ > > + if grep -q PRItime $<; then \ > > + (\ > > + sed -e 's|PRItime|PRIuMAX|g' <$< \ > > + >.build/pot/po/$< && \ > > + cd .build/pot/po && \ > > + $(XGETTEXT) --omit-header \ > > + -o $(@:.build/pot/po/%=%) \ > > + $(XGETTEXT_FLAGS_C) $< && \ > > + rm $<; \ > > + ); \ > > + else \ > > + $(XGETTEXT) --omit-header \ > > + -o $@ $(XGETTEXT_FLAGS_C) $<; \ > > + fi > > > > -po/git.pot: $(GENERATED_H) FORCE > > - # All modifications will be reverted at the end, so we do not > > - # want to have any local change. > > - git diff --quiet HEAD && git diff --quiet --cached > > +$(LOCALIZED_SH_GEN_PO): .build/pot/po/%.po: % > > + $(call mkdir_p_parent_template) > > + $(QUIET_XGETTEXT)$(XGETTEXT) --omit-header \ > > + -o$@ $(XGETTEXT_FLAGS_SH) $< > > > > - @for s in $(LOCALIZED_C) $(LOCALIZED_SH) $(LOCALIZED_PERL); \ > > - do \ > > - sed -e 's|PRItime|PRIuMAX|g' <"$$s" >"$$s+" && \ > > - cat "$$s+" >"$$s" && rm "$$s+"; \ > > - done > > +$(LOCALIZED_PERL_GEN_PO): .build/pot/po/%.po: % > > + $(call mkdir_p_parent_template) > > + $(QUIET_XGETTEXT)$(XGETTEXT) --omit-header \ > > + -o$@ $(XGETTEXT_FLAGS_PERL) $< > > +q > > +define gen_pot_header > > +$(XGETTEXT) $(XGETTEXT_FLAGS_C) \ > > + -o - /dev/null | \ > > +sed -e 's|charset=CHARSET|charset=UTF-8|' \ > > + -e 's|\(Last-Translator: \)FULL NAME <.*>|\1make by the Makefile|' \ > > + -e 's|\(Language-Team: \)LANGUAGE <.*>|\1Git Mailing List <git@xxxxxxxxxxxxxxx>|' \ > > + >$@ && \ > > +echo '"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\\n"' >>$@ > > +endef > > > > - $(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ $(XGETTEXT_FLAGS_C) $(LOCALIZED_C) > > - $(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ --join-existing $(XGETTEXT_FLAGS_SH) \ > > - $(LOCALIZED_SH) > > - $(QUIET_XGETTEXT)$(XGETTEXT) -o$@+ --join-existing $(XGETTEXT_FLAGS_PERL) \ > > - $(LOCALIZED_PERL) > > +.build/pot/git.header: $(LOCALIZED_ALL_GEN_PO) > > + $(call mkdir_p_parent_template) > > + $(QUIET_GEN)$(gen_pot_header) > > So re a previous round I think the addition of a header to po/git.pot > doesn't make sense, i.e. it's not needed, so we can just drop this > entirely. You missed a note about this in commit log in this reroll: The "po/git.pot" file should have a header entry, because a proper header entry will increase the speed of creating a new po file using msginit and set a proper "POT-Creation-Date:" field in the header entry of a "po/XX.po" file. We use xgettext to generate a separate header file at ".build/pot/git.header" from "/dev/null", and use this header to assemble "po/git.pot". > But if you really want to include it anyway then this dependency is > broken, the header just depends on this gen_pot_header you've defined, > we don't need to re-generate it every time any single source file > changes. The header file ".build/pot/git.header" should have the same prerequisites as the file "po/git.pot". This is because we need to refresh the timestamp of the "POT-Creation-Date:" field. > Or is the intent to have its timestamp bumped if any of the source files > change? Right. Just like using xgettext to create a "po/git.pot" with a fresh header entry, we should recreate the header file for msgcat to create "po/git.pot". We want to switch to a new l10n workflow, and we want everything works as usual. When l10n contributors update their "po/XX.po" file, it should contain the latest "POT-Creation-Date" timestamp instead of sticking to a fixed and outdated timestamp. > If so I think that to the extent we need headers at all (for the *.po > files) it might be worthwhile to just have the timestamp be the same > epoch we use for the tests in t/, i.e. never update it and avoid the > needless churn. If people need to check when things were last changed, > or initially added, they can use this little thing called "git" :) I stick to my option, see above.