Ævar Arnfjörð Bjarmason wrote: > And even though gettext tries to make cases like these fast > (http://www.gnu.org/software/hello/manual/gettext/Optimized-gettext.html) > it's still a lot slower than hardcoded English: > > perl -MBenchmark=:all -MData::Dump=dump -E 'cmpthese(10, { > outside => sub { system "./test-outside-loop >/dev/null" }, > inside => sub { system "./test-in-loop >/dev/null" }, > });' > > s/iter inside outside > inside 13.4 -- -83% > outside 2.26 495% -- Given: -- 8< -- #include <stdio.h> #include <stdlib.h> #include <locale.h> #include <libintl.h> #include "gettext.h" int foo(long int x) { return x * x; } int main(void) { const char *podir = "/usr/local/share/locale"; long int i; bindtextdomain("git", podir); setlocale(LC_MESSAGES, ""); setlocale(LC_CTYPE, ""); textdomain("git"); for (i = 0; i < 1000000; i++) printf(_("Some interesting label: %ld\n"), foo(i)); return 0; } -- >8 -- No message catalog is installed here, and I compile with gcc-4.5 -Wall -W -O2. The results are similar. A: the standard way. gettext.h contains "#define _(s) gettext(s)" or | static inline char *_(const char *s) __attribute__((__format_arg(1)__)) | { | return gettext(s); | } 6.74user 0.02system 0:06.78elapsed 99%CPU (0avgtext+0avgdata 2304maxresident)k 0inputs+0outputs (0major+182minor)pagefaults 0swaps (about 7 seconds.) B: noop. gettext.h contains "#define _(s) s" 1.35user 0.01system 0:01.37elapsed 99%CPU (0avgtext+0avgdata 2192maxresident)k 0inputs+0outputs (0major+172minor)pagefaults 0swaps (about 1.5 seconds.) It would seem that __attribute__((__pure__)) should let the compiler give us the best of both worlds, but no luck. Even __attribute__((__const__)) is ignored; the compiler inlines the body of _() before it has a chance to notice. We can fool the compiler into paying attention by making it not inlinable: if gettext.h contains | extern char *_(const char *s) __attribute__((__format_arg__(1), __const__)); and a separate gettext.c contains | #include <libintl.h> | #include "gettext.h" | char *_(const char *s) { return gettext(s); } we get the performance of B again: 1.36user 0.01system 0:01.38elapsed 98%CPU (0avgtext+0avgdata 2304maxresident)k 0inputs+0outputs (0major+180minor)pagefaults 0swaps This amounts to lying to the compiler, since it is possible for the string pointed to by a single address s to differ between calls to _. The __pure__ attribute would be more honest, but for reasons I don’t understand it suppresses the optimization. Moral of the story: at least in simple cases, we can keep the performance and the typechecking. Phew. HTH, Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html