Initial ugly reverse-translator

Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> · Sat, 19 Apr 2008 22:52:07 +0800

Hi all

I've chucked together a quick and very ugly script to read the .po files 
from the backend and produce a simple database to map translations back 
to the original strings and their source locations. It's a very dirty 
.po reader that doesn't try to parse the format properly, but it does 
the job. There's no search interface yet, this is just intended to get 
to the point where useful queries can be run on the data and the most 
effective queries can be figured out.

Right now queries against errors without format-string substitutions 
work ok, if not great, with pg_tgrm based lookups, eg:

test=# SELECT message_id, is_format, message, translation
test-# FROM po_translation INNER JOIN po_message ON 
po_translation.message_id = po_message.id INNER JOIN
test-# WHERE  'el valor de array debe comenzar con «{» o información de 
dimensión' % translation
test-# ORDER BY similarity('el valor de array debe comenzar con «{» o 
información de dimensión', translation) desc;

message_id | is_format |                          
message                           |                             translation
------------+-----------+------------------------------------------------------------+---------------------------------------------------------------------
      4470 | f         | array value must start with \"{\" or dimension 
information | el valor de array debe comenzar con «{» o información de 
dimensión"
      4437 | f         | argument must be empty or one-dimensional 
array            | el argumento debe ser vacío o un array unidimensional"
(2 rows)

test=# SELECT DISTINCT srcfile, srcline FROM po_location WHERE 
message_id = 4437;
                          srcfile                           | srcline
-------------------------------------------------------------+---------
/a/pgsql/HEAD/pgtst/src/backend/utils/adt/array_userfuncs.c |     121
utils/adt/array_userfuncs.c                                 |      99
utils/adt/array_userfuncs.c                                 |     121
utils/adt/array_userfuncs.c                                 |     124
(4 rows)

It's also useful for format-string based messages, but more thought is 
needed on how best to handle them. A LIKE query using the format-string 
message as the pattern (after converting the pattern syntax to SQL 
style) would be (a) slow and (b) very sensitive to formatting and other 
variation. I haven't spent any time on that bit yet, but if anybody has 
any ideas I'd be glad to hear them.

Anyway, the initial version of the script can be found at:

http://www.postnewspapers.com.au/~craig/poread.py

Consider running it in a new database as it's extremely poorly tested, 
written very quickly and dirtily, and contains DDL commands. The schema 
can be found inline in the script. The psycopg2 Python module is 
required, and the pg_tgrm contrib module must be loaded in the database 
you use the script with.

Once I'm happy with the queries for translation lookups I'll bang 
together a quick web interface for the script and clean it up. At that 
point it might start being useful to people here.

--
Craig Ringer