On Mon, Oct 04 2021, Jeff King wrote: > On Mon, Oct 04, 2021 at 03:42:49AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> Change the output of show_ambiguous_object() added in [1] and last >> tweaked in [2] to be more friendly to translators. By being able to >> customize the sprintf formats we're even ready for RTL languages. >> >> 1. ef9b0370da6 (sha1-name.c: store and use repo in struct >> disambiguate_state, 2019-04-16) >> 2. 5cc044e0257 (get_short_oid: sort ambiguous objects by type, >> then SHA-1, 2018-05-10) > > I suspect you meant 1ffa26c461 (get_short_sha1: list ambiguous objects > on error, 2016-09-26) for the first one. > > I had to stare at the patch for a while to understand the goal here. I > think this would have been a bit easier to review if "change" in your > first sentence was described a bit more. Perhaps: > > The list of candidates output by show_ambiguous_output() is not marked > for translation. At the very least we want to allow the text "the > candidates are" to be translated. But we also format individual > candidate lines like: > > deadbeef commit 2021-01-01 - Some Commit Message > > by formatting the individual components, then using a printf-format to > arrange them in the correct order. Even though there's no text here to > be translated, the order and spacing is determined by the format > string. Allowing that to be translated helps RTL languages. > > I have a few comments on the patch itself. The biggest thing is that it > changes the format to add an extra newline (between "The candidates > are:" and the actual list). I don't have a strong opinion on including > that or not, but it seemed unintentional given the comment on the first > commit (and its lack of mention here). That was unintentional, sorry. Will fix. > The rest are mostly observations, not criticisms. You can take them with > the appropriate grain of salt given that I don't do translation work > myself, nor know any RTL languages. > >> @@ -366,18 +373,34 @@ static int show_ambiguous_object(const struct object_id *oid, void *data) >> if (commit) { >> struct pretty_print_context pp = {0}; >> pp.date_mode.type = DATE_SHORT; >> - format_commit_message(commit, " %ad - %s", &desc, &pp); >> + format_commit_message(commit, _(" %ad - %s"), &desc, &pp); >> } > > Is it OK to use non-printf expansions with the gettext code? Presumably > the translated string would have the same set of placeholders in it, but > my understanding is that gettext may sometimes munge the %-placeholders > (e.g., allowing numbered ones for re-ordering). I admit I don't know how > any of that works, but I just wonder if this "%ad" may cause confusion > (or even if not, if it is even possible to re-order it for an RTL > language). It's not, oops. I missed that, blinders on for the "%ad". Will construct it in advance and use %s interpolation separately. >> } else if (type == OBJ_TAG) { >> struct tag *tag = lookup_tag(ds->repo, oid); >> if (!parse_tag(tag) && tag->tag) >> - strbuf_addf(&desc, " %s", tag->tag); >> + strbuf_addf(&desc, _(" %s"), tag->tag); >> } > > I wonder whether " %s" is worthwhile as a translatable string. It does > seem to be unique among strings marked for translation, but there are a > ton of non-translated instances. Would context ever matter here? > > My impression is that this kind of translation-lego is frowned upon, and > we might be better off repeating ourselves a bit more. I.e., something > like: > > if (commit) { > struct strbuf date = STRBUF_INIT; > struct strbuf subject = STRBUF_INIT; > format_commit_message(commit, "%ad", &date, &pp); > format_commit_message(commit, "%s", &subject, &pp); > strbuf_addf(advice, _(" %s commit %s - %s\n"), > repo_find_unique_abbrev(...), > date.buf, subject.buf); > strbuf_release(&date); > strbuf_release(&subject); > } else if (type == OBJ_TAG) { > ... > strbuf_addf(advice, _(" %s tag %s\n"), > repo_find_unique_abbrev(...), tag->tag); > } else { > /* TRANSLATORS: the fields are abbreviated oid and type */ > strbuf_addf(advice, _(" %s %s\n"), > repo_find_unique_abbrev(...), type_name(type)); > } > > Though that last one similarly has a real lack of context. Yeah that's better. Will change it to something like that. >> - advise(" %s %s%s", >> - repo_find_unique_abbrev(ds->repo, oid, DEFAULT_ABBREV), >> - type_name(type) ? type_name(type) : "unknown type", >> - desc.buf); >> + strbuf_addf(advice, >> + /* >> + * TRANSLATORS: This is a line of ambiguous object >> + * output. E.g.: >> + * >> + * "deadbeef commit 2021-01-01 - Some Commit Message\n" >> + * "deadbeef tag Some Tag Message\n" >> + * "deadbeef tree\n" >> + * >> + * I.e. the first argument is a short OID, the >> + * second is the type name of the object, and the >> + * third a description of the object, if it's a >> + * commit or tag. In that case the " %ad - %s" and >> + * " %s" formats above will be used for the third >> + * argument. >> + */ >> + _(" %s %s%s\n"), >> + repo_find_unique_abbrev(ds->repo, oid, DEFAULT_ABBREV), >> + type_name(type) ? type_name(type) : "unknown type", >> + desc.buf); > > Would you want to translate "unknown type" here, as well? It's probably > not that important in practice, but it seems like a funny omission. Willdo. >> @@ -488,12 +516,19 @@ static enum get_oid_result get_short_oid(struct repository *r, >> if (!ds.ambiguous) >> ds.fn = NULL; >> >> - advise(_("The candidates are:")); >> repo_for_each_abbrev(r, ds.hex_pfx, collect_ambiguous, &collect); >> sort_ambiguous_oid_array(r, &collect); >> >> - if (oid_array_for_each(&collect, show_ambiguous_object, &ds)) >> + if (oid_array_for_each(&collect, show_ambiguous_object, &as)) >> BUG("show_ambiguous_object shouldn't return non-zero"); >> + >> + /* >> + * TRANSLATORS: The argument is the list of ambiguous >> + * objects composed in show_ambiguous_object(). See >> + * its "TRANSLATORS" comment for details. >> + */ >> + advise(_("The candidates are:\n\n%s"), sb.buf); > > Here's where the extra newline. > > I understand why the earlier ones were changed for RTL languages. But > this one is always line-oriented. Is the point to help bottom-to-top > languages? I can buy that, though it feels like that would be something > that the terminal would deal with (because even with this, you're still > getting the "error:" line printed separately, for example). > > I don't think what this is doing is wrong (at first I wondered about the > "hint:" lines, but because advise() looks for embedded newlines, we're > OK). But if the translation doesn't need to reorder things across lines, > this extra format-into-a-strbuf step doesn't seem necessary. We can just > call advise() directly in show_ambiguous_object(), as before. > > If it is necessary, then note that you leak "sb" here. I'll keep that bit as-is, it's not strictly necessary, but it gives translators a bit more context.