I have written a log daemon application using Python to write data into
PostgreSQL, however it periodically errors with
Invalid byte sequence for encoding "UTF8": 0xe2 0x3f 0x27
obviously it's receiving some data that it can't encode to UTF8 and
write to the database, but I can't figure out a method to retrieve the
incoming data in order to see what data its receiving that it can't
encode.
Every attempt I have made to use Python's built in try/except mechanism
to catch the error just stops logging entirely when triggered, instead
of preforming the except section of code I wrote to output the data to a
text file.
While I continue to try and figure that out, does anyone have more
information as to what data encoding/character sets squid can output in
the log data? I am asuming its a special character used in the
request_url field that's causing the problem, I just haven't a slightest
clue as to what, as I haven't been able to trigger it in my test
environment, only on the production one.
I am using a custom log output format, that is basically the default
with the field separators changed to |~|, to make parsing the output
into columns easier.
logformat SQL
%ts.%03tu|~|%6tr|~|%>a|~|%Ss|~|%03>Hs|~|%<st|~|%rm|~|%ru|~|%[un|~|%Sh|~|%<a|~|%mt
--
Thanks,
Dean E. Weimer
http://www.dweimer.net/