Re: Re: sanitizing/security

Chris Shiflett <shiflett@xxxxxxx> · Tue, 21 Dec 2004 00:06:54 -0800 (PST)

--- Richard Lynch <ceo@xxxxxxxxx> wrote:
> What regular expression does one use when there really isn't a
> whole lot you can say about the text?...
> 
> I mean, say for a guestbook or bulletin board or for a person's
> Bio or...
> 
> You can limit it to a certain number of characters in length.
> 
> You can mess with strip_tags and also do an ereg to rip out any
> kind of JavaScript on tags you want to *allow*.
> 
> But then what?
> 
> I mean, it seems like there's still an awful lot of wiggle room
> for mischief there, in an arbitrary string typed by the user.

This type of data is certainly the most difficult to filter, especially if
you try to adhere to very strict security principles.

You start with the same question as with any other data - what exactly do
I want to allow? This is much easier and less prone to error than asking
what you want to reject. If someone is entering a bio, a whitelist is
difficult to create, but not impossible. The best approach to take when
valid data is an unknown is to create a system that learns. This can be as
simple as enabling a whitelist approach, and logging all failures, but
using some other method for interim protection (e.g., a whitelist failure
is not considered a security breach). Manual inspection of failures can be
used to enhance the whitelist, and once you feel it is capable, you can
switch to this as the primary method of protection.

I must admit that I often take the lazy way out (with the caveat that some
situations demand a higher level of security and a more strict adherence
to best practices). The lazy way to filter output is htmlentities(), a
function that converts every character that has an equivalent HTML entity
to that entity. Thus, any character that may have special meaning to a
browser is converted to something that is only useful in displaying that
character. If you want to allow some markup, convert those back (use a
literal match when possible - pattern matching as a good last resort).

When using something in an SQL query, there are some good escaping
functions that can be used. I feel pretty comfortable using
mysql_escape_string() on any data to eliminate the practicality of SQL
injection. Of course, this shouldn't be a complete substitute for proper
data filtering, so I'm still talking about the lazy (or "least you can
do") approach.

So, while I agree that free-form text is very difficult to filter, there
are some pretty simple steps you can take to mitigate the risks, or you
can adhere to strict practices if you work at it.

Hope that helps.

Chris

=====
Chris Shiflett - http://shiflett.org/

PHP Security - O'Reilly     HTTP Developer's Handbook - Sams
Coming Soon                 http://httphandbook.org/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php