On Fri, 2 Sep 2005, Darryl L. Miles wrote:
Just to confirm with you. It is the ESCAPING mechanism that I am wanting to
correct, its not clear if this is the route your patch has taken.
It is.
The escaping is already there, what was wrong was the automatic selection
of which escaping mechanism to use. There is four (or actually five)
different escaping mechanisms available, see the docs for the logformat
directive. Squid tries to automatically select an appropriate esaping
mechanism based on what the surrounding text in the format string looks
like (if inside double quotes then escape suitably for a double quoted
string, if inside [] then use the squid mime escaping rule, etc..
I would like to %ru component to be escaped according to the same rules
as Apache at apache_1.3.33/src/main/util.c:1444 ap_escape_logitem()
function. This function escapes the following 8bit characters when
found in the URL:
/* For logging, escape all control characters,
* double quotes (because they delimit the request in the log file)
* backslashes (because we use backslash for escaping)
* and 8-bit chars with the high bit set
*/
Which work out like:
" => \"
\ => \\
<BS> => \b (character backspace literal into C escape string, however \xHH
would be acceptable instead)
<NL> => \n (character newline literal into C escape string, however \xHH
would be acceptable instead)
<CR> => \r (character carrige return literal into C escape string, however
\xHH would be acceptable instead)
<TAB> => \t (character tab literal into C escape string, however \xHH would
be acceptable instead)
<Ctrl-V> => \v (character ctrl-v literal into C escape string, however \xHH
would be acceptable instead)
iscntrl(c) => \xHH (the remaining control chars)
isprint(c) => \xHH (I'm not sure how reliably isprint(c) == (c & 0x80) ?)
I don't think the Squid double quoted string escape rules is 100%
identical to this, but quite likely sufficient for your purposes. Patches
making the double quoted string escape rule better is welcome if you find
the format used by Squid insufficiently escaped.
Your use of the term quoting in the email and the letter being " to make %"ru
makes me believe the new resulting output will be like:
Just poor wording on my part. What I meant was escaping and this is also
what the patch tries clarify in the logformat documentation.
"GET "http://62.XX.XX.109//awstats.pl\"w;wget" HTTP/1.1"
When what I really meant (inspite of my typo in the example) was:
"GET http://62.XX.XX.109//awstats.pl\"w;wget\" HTTP/1.1"
Which is what you will get.
From the current documentation in the patch:
The <format specification> is a string with embedded % format codes
% format codes all follow the same basic structure where all but
the formatcode is optional. Output strings are automatically escaped
as required according to their context and the output format
modifiers are usually not needed, but can be specified if an explicit
output format is desired.
% ["|[|'|#] [-] [[0]width] [{argument}] formatcode
" output in quoted string format
[ output in squid text log format as used by log_mime_hdrs
' output as-is
- left aligned
width field width. If starting with 0 then the
output is zero padded
Corrections/additions to make the documentation easier to understand is
always welcome.
Regards
Henrik