Search squid archive

Re: Force ASCII encoding for access.log fields?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27/06/2014 11:25 a.m., Mark DeCheser wrote:
> Hi everyone --
> 
> I recently ran into a strange condition within my Squid access logs which
> is making importing the events into a database a bit more difficult. 
> Note, I am not logging directly to a database, but rather parsing event
> into a centralized database via batch/cron.
> 
> Events in the access log, mainly which I see are in the ContentType field,
> are being recorded as non-ASCII characters.  When I attempt to import the
> log into PostgreSQL, psql barfs.
> 
> Our logfile format in our Squid config looks like this:
> 
> logformat my-custom %la,%>a,%10tr,%>st,%<st,%rm,%03>Hs,%mt,%[un,%tg
> access_log /var/log/squid/access.log my-custom
> 
> Some examples of the events look like this:
> 
> [serverIP],[clientIP],
> 4012,692,498,GET,200,º^_x°*,username,20/Jun/2014:00:06:36

The log format you used does not match this log line. The format produces:

[squid-listening-IP],[clientIP],
4012,692,498,GET,200,º^_x°*,username,20/Jun/2014:00:06:36

> 
> I'm running Squid instances on VPSes in a number of different countries. 
> This particular Squid instance is in Norway, and coincidentally enough
> happens to be the only VPS delivered to my organization that wasn't
> already set to en_US.UTF-8.
> 
> # cat /etc/sysconfig/i18n
> LANG="en_US.UTF-8"
> SYSFONT="latarcyrheb-sun16"
> # echo $LANG
> en_US.UTF-8
> 
> It could be a coincidence, but based on the fact that I have instances all
> over the world, and only this instance is giving me trouble ... I found it
> to be an odd coincidence.
> 
> Ideally, if it's possible for Squid to force some kind of hex encoding for
> this Content-Type (or really, for any field that receives non ASCII
> characters), that would be optimal.   There are downstream alternatives
> which include finding / replacing non-ASCII chars in a preparation script.
>  There's also the option to change the charset of the database itself so
> that it doesn't complain about the charset, but these alternatives seem a
> little reactionary.
> 
> I've reviewed:  http://www.squid-cache.org/Doc/config/logformat/
> I also tried using iconv unsuccessfully: 
> http://stackoverflow.com/questions/12999651/how-to-remove-non-utf-8-characters-from-text-file
> 
> It essentially leaves me with offset fields/columns in the logfile.
> 
> I also reviewed Amos' comment here: 
> http://www.squid-cache.org/mail-archive/squid-users/201109/0343.html
> 
> The difference in my case is that I'm dealing with Content-Type, not URL. 

URL-encoding is the %xx character encoding, it can be (and is) applied
to anything which can legitimately contain non-ASCII characters or ASCII
special characters. Content-Type header is not one of those places.

You can use the '#' format modifier to URL-encode that %mt field
explicitly. Like so:  %#mt

If you will share the exact Squid version you are using I would also
like to check the code to see if the mt code is being correctly setup,
that log entry looks a bit like random memory being displayed as if it
were text.

Amos




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux