On 27/06/2014 11:25 a.m., Mark DeCheser wrote: > Hi everyone -- > > I recently ran into a strange condition within my Squid access logs which > is making importing the events into a database a bit more difficult. > Note, I am not logging directly to a database, but rather parsing event > into a centralized database via batch/cron. > > Events in the access log, mainly which I see are in the ContentType field, > are being recorded as non-ASCII characters. When I attempt to import the > log into PostgreSQL, psql barfs. > > Our logfile format in our Squid config looks like this: > > logformat my-custom %la,%>a,%10tr,%>st,%<st,%rm,%03>Hs,%mt,%[un,%tg > access_log /var/log/squid/access.log my-custom > > Some examples of the events look like this: > > [serverIP],[clientIP], > 4012,692,498,GET,200,º^_x°*,username,20/Jun/2014:00:06:36 The log format you used does not match this log line. The format produces: [squid-listening-IP],[clientIP], 4012,692,498,GET,200,º^_x°*,username,20/Jun/2014:00:06:36 > > I'm running Squid instances on VPSes in a number of different countries. > This particular Squid instance is in Norway, and coincidentally enough > happens to be the only VPS delivered to my organization that wasn't > already set to en_US.UTF-8. > > # cat /etc/sysconfig/i18n > LANG="en_US.UTF-8" > SYSFONT="latarcyrheb-sun16" > # echo $LANG > en_US.UTF-8 > > It could be a coincidence, but based on the fact that I have instances all > over the world, and only this instance is giving me trouble ... I found it > to be an odd coincidence. > > Ideally, if it's possible for Squid to force some kind of hex encoding for > this Content-Type (or really, for any field that receives non ASCII > characters), that would be optimal. There are downstream alternatives > which include finding / replacing non-ASCII chars in a preparation script. > There's also the option to change the charset of the database itself so > that it doesn't complain about the charset, but these alternatives seem a > little reactionary. > > I've reviewed: http://www.squid-cache.org/Doc/config/logformat/ > I also tried using iconv unsuccessfully: > http://stackoverflow.com/questions/12999651/how-to-remove-non-utf-8-characters-from-text-file > > It essentially leaves me with offset fields/columns in the logfile. > > I also reviewed Amos' comment here: > http://www.squid-cache.org/mail-archive/squid-users/201109/0343.html > > The difference in my case is that I'm dealing with Content-Type, not URL. URL-encoding is the %xx character encoding, it can be (and is) applied to anything which can legitimately contain non-ASCII characters or ASCII special characters. Content-Type header is not one of those places. You can use the '#' format modifier to URL-encode that %mt field explicitly. Like so: %#mt If you will share the exact Squid version you are using I would also like to check the code to see if the mt code is being correctly setup, that log entry looks a bit like random memory being displayed as if it were text. Amos