On Oct 23, 2008, at 2:10 PM, Jochem Maas wrote:
The order is reversed, so if $host has a non-zero length, it is not
escaped.
first thing that I noticed, second wondering why no charset was
specified,
thirdly was wondering why it's not plain:
$host = htmlentities($host);
but nonetheless your point stands, :-)
Yeah, fair enough.
To my credit, I also noticed the problem without spending more than a
second or two on that line, but I also recognized how it could be
missed. To me, it's similar to missing when someone calls a functions
and gets the order of arguments wrong. You can tell what they meant,
so the error doesn't stand out as boldly. Perhaps subconsciously you
anticipate that they're right, because in most of the code, they are.
The challenge of being perfect is why I've developed a number of tools
to help me out. I'm going to release one of the best of these as open
source in a few months. I might mention that on this list, since it
seems appropriate. Hopefully no one will mind the "advertising" too
much. :-)
now about that charset ... your blog post uses UTF-7 to demonstrate
the
potential for problems ... but htmlentities() doesn't support that
charset,
or at least not according to the docs, in fact the list of supported
charsets
is quite limited, out of curiosity what would your recommendation be
if one is faced with a having 'htmlentize' a string encoded in UTF-7
or
some other charset not supported by htmlentities()?
That's a good question. I would probably convert it to something like
UTF-8, escape it, then convert it back. I've never faced this
situation, and the scenario I was recreating in my post was when
someone attacked Google using UTF-7. Google didn't actually want to
support that character encoding.
If you specify ISO-8859-1 in your Content-Type header, it's actually
fine to omit the character encoding in htmlentities(), because it uses
that by default. (Also, not all mismatches are exploitable.) However,
it always catches my eye, because it demonstrates a lax treatment of
character encoding in general. I like to see it explicitly declared
everywhere.
a second question: strip_tags() doesn't have a charset parameter,
how does
it manage to cope without knowing the input string encoding? or does
it
not and is it actually vulnerable to maliciously encoded input?
My guess would be that it doesn't cope. :-) I never use strip_tags(),
so someone else might be able to offer a much better answer.
Hope that helps, and thanks for the discussion.
Chris
--
Chris Shiflett
http://shiflett.org/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php