Re: [BUGS] main log encoding problem

Alban Hertroys <haramrae@xxxxxxxxx> · Thu, 19 Jul 2012 14:16:10 +0200

On 19 July 2012 13:50, Alexander Law <exclusion@xxxxxxxxx> wrote:
>> I like Craig's idea of adding the client encoding to the log lines. A
>> possible problem with that (I'm not an encoding expert) is that a log
>> line like that will contain data about the database server meta-data
>> (log time, client encoding, etc) in the database default encoding and
>> database data (the logged query and user-supplied values) in the
>> client encoding. One option would be to use the client encoding for
>> the entire log line, but would that result in legible meta-data in
>> every encoding?
>
> I think then we get non-human readable logs. We will need one more tool to
> open and convert the log (and omit excessive encoding specification in each
> line).

Only the parts that contain user-supplied data in very different
encodings would not be "human readable", similar to what we already
have.

>> It appears that the primarly here is that SQL statements and
>> user-supplied data are being logged, while the log-file is a text file
>> in a fixed encoding.
>
> Yes, and in in my opinion there is nothing unusual about it. XML/HTML are
> examples of a text files with fixed encoding that can contain multi-language
> strings. UTF-8 is the default encoding for XML. And when it's not good
> enough (as Tatsou noticed), you still can switch to another.

Yes, but in those examples it is acceptable that the application fails
to write the output. That, and the output needs to be converted to
various different client encodings (namely that of the visitor's
browser) anyway, so it does not really add any additional overhead.

This doesn't hold true for database server log files. Ideally, writing
those has to be reliable (how are you going to catch errors
otherwise?) and should not impact the performance of the database
server in a significant way (the less the better). The end result will
probably be somewhere in the middle.

>> Perhaps another solution would be to add the ability to log certain
>> types of information (not the core database server log info, of
>> course!) to a database/table so that each record can be stored in its
>> own encoding?
>> That way the transcoding doesn't have to take place until someone is
>> reading the log, you'd know what to transcode the data to (namely the
>> client_encoding of the reading session) and there isn't any issue of
>> transcoding errors while logging statements.
>
> I don't think it would be the simplest solution of the existing problem. It
> can be another branch of evolution, but it doesn't answer the question -
> what encoding to use for the core database server log?

It makes that problem much easier. If you need the "human-readable"
logs, you can write those to a different log (namely one in the
database). The result is that the server can use pretty much any
encoding (or a mix of multiple!) to write its log files.

You'll need a query to read the human-readable logs of course, but
since they're in the database, all the tools you need are already
available to you.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general