Search Postgresql Archives

Initial ugly reverse-translator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all

I've chucked together a quick and very ugly script to read the .po files from the backend and produce a simple database to map translations back to the original strings and their source locations. It's a very dirty .po reader that doesn't try to parse the format properly, but it does the job. There's no search interface yet, this is just intended to get to the point where useful queries can be run on the data and the most effective queries can be figured out.

Right now queries against errors without format-string substitutions work ok, if not great, with pg_tgrm based lookups, eg:

test=# SELECT message_id, is_format, message, translation
test-# FROM po_translation INNER JOIN po_message ON po_translation.message_id = po_message.id INNER JOIN test-# WHERE 'el valor de array debe comenzar con «{» o información de dimensión' % translation test-# ORDER BY similarity('el valor de array debe comenzar con «{» o información de dimensión', translation) desc;

message_id | is_format | message | translation
------------+-----------+------------------------------------------------------------+---------------------------------------------------------------------
4470 | f | array value must start with \"{\" or dimension information | el valor de array debe comenzar con «{» o información de dimensión" 4437 | f | argument must be empty or one-dimensional array | el argumento debe ser vacío o un array unidimensional"
(2 rows)

test=# SELECT DISTINCT srcfile, srcline FROM po_location WHERE message_id = 4437;
                          srcfile                           | srcline
-------------------------------------------------------------+---------
/a/pgsql/HEAD/pgtst/src/backend/utils/adt/array_userfuncs.c |     121
utils/adt/array_userfuncs.c                                 |      99
utils/adt/array_userfuncs.c                                 |     121
utils/adt/array_userfuncs.c                                 |     124
(4 rows)

It's also useful for format-string based messages, but more thought is needed on how best to handle them. A LIKE query using the format-string message as the pattern (after converting the pattern syntax to SQL style) would be (a) slow and (b) very sensitive to formatting and other variation. I haven't spent any time on that bit yet, but if anybody has any ideas I'd be glad to hear them.

Anyway, the initial version of the script can be found at:

http://www.postnewspapers.com.au/~craig/poread.py

Consider running it in a new database as it's extremely poorly tested, written very quickly and dirtily, and contains DDL commands. The schema can be found inline in the script. The psycopg2 Python module is required, and the pg_tgrm contrib module must be loaded in the database you use the script with.

Once I'm happy with the queries for translation lookups I'll bang together a quick web interface for the script and clean it up. At that point it might start being useful to people here.

--
Craig Ringer


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux