Re: Preventing SQL Injection/ Cross Site Scripting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just my two cents worth...

Magic quotes are the work of the devil. It's a shame that so many PHP installations have them enabled, and a huge disappointment that PHP is actually distributed with this stuff enabled! The mere fact that a script can't change this setting creates a real hassle and is my major gripe about the whole situation. I've *always* followed the programming practice of "work with your data unencoded, then encode it appropriately only at the last final output stage". That way you always know exactly what you're working with, no surprises, where each character is always 1 byte, regardless of what character it is. Here's a typical block of code which I include in the start of nearly all my PHP scripts:

<?php
//Do not delete this function! (unless you don't mind data corruption with PHP's default settings)
function stripslashes_deep($value) {
return is_array($value) ? array_map('stripslashes_deep', $value) : stripslashes($value);
}
//Get rid of those stupid damn annoying asanine magic quotes which just garble up your data.
if (get_magic_quotes_gpc()) {
 /*
 (unfortunately in PHP these are enabled by default.  AHH!  Which idiot
 thought this was a good idea to turn them on by default?  Good programming
 practise is to manually encode only the data that requires encoding just
 just before dumping it to places which need it (ie. databases), not
 automatically screwing up the entire collection of the system's variables!
 AHH!)
 */
$GLOBALS['HTTP_POST_VARS'] = stripslashes_deep($GLOBALS['HTTP_POST_VARS']);
 $GLOBALS['_POST'] = stripslashes_deep($GLOBALS['_POST']);
 $GLOBALS['HTTP_GET_VARS'] = stripslashes_deep($GLOBALS['HTTP_GET_VARS']);
 $GLOBALS['_GET'] = stripslashes_deep($GLOBALS['_GET']);
$GLOBALS['HTTP_COOKIE_VARS'] = stripslashes_deep($GLOBALS['HTTP_COOKIE_VARS']);
 $GLOBALS['_COOKIE'] = stripslashes_deep($GLOBALS['_COOKIE']);
$GLOBALS['HTTP_SERVER_VARS'] = stripslashes_deep($GLOBALS['HTTP_SERVER_VARS']);
 $GLOBALS['_SERVER'] = stripslashes_deep($GLOBALS['_SERVER']);
 $GLOBALS['HTTP_ENV_VARS'] = stripslashes_deep($GLOBALS['HTTP_ENV_VARS']);
 $GLOBALS['_ENV'] = stripslashes_deep($GLOBALS['_ENV']);
$GLOBALS['HTTP_POST_FILES'] = stripslashes_deep($GLOBALS['HTTP_POST_FILES']);
 $GLOBALS['_FILES'] = stripslashes_deep($GLOBALS['_FILES']);
 $GLOBALS['_REQUEST'] = stripslashes_deep($GLOBALS['_REQUEST']);
}
set_magic_quotes_runtime (0); //Fortunately these can be killed with a single statement, unlike magic_quotes_gpc
?>

Eh, don't mind the comments. Sometimes PHP programming can become quite frustrating. ;-)


On to the next stage... encoding data for output to an HTML document.

Personally, I prefer using htmlspecialchars() over htmlentities(), because it only converts those characters that *must* be converted for HTML ( & < > " ). There's no use in turning your other 1-byte characters into 5, 6, or 7-byte strings, if you already provided the correct character set in the Content-Type HTTP header (as you should!).

Actually, if you want to get really picky, I usually use the following conversions:

For most tag parameters: htmlspecialchars($tagdata)

For display text: nl2br(htmlspecialchars($displaytext))
(This keeps newline sequences in effect.)

For text which may contain a few control characters, special characters, or other binary data (sometimes useful in hidden form fields, or for special accented characters and non-english languages): preg_replace('/([\\x00-\\x1F\\x7F-\\xFF])/e','"&#".ord(substr("$1",-1)).";"',htmlspecialchars($binarytext)) (This encodes the data in a mostly still human-readable form, entirely with 7-bit ASCII characters only.)

For binary data (sometimes useful in hidden form fields): strtr(base64_encode($binarydata),'+/=','-_.'); (All the advantages of Base64 encoding, without incurring any overhead from URL encoding when the form is submitted.)


Anyhow, back on track to the original topic of this thread. For anything that gets written to a database or used for a query, I suggest escaping the data using a function specifically designed for that database. (And there are many different functions for the many different popular databases.) This should have *nothing* to do with blocking XSS, turning < into &lt;, etc. Preparing for the database query string is no place to do the data conversion which will be necessary for the final output.


The last topic... blocking XSS attacks. If you use the encoding routines I listed above for outputting to HTML documents, you're already safe. And you're not outlawing any characters either... if someone wants to type < and >, or show semi-colons or whatever, they can, knowing with certainty that what they type is exactly what others will see. If you need to let users enter some mark-up, do what message boards and web log sites have been doing for years: BBcode. Then you can write your own routines to provide only the features you need, using a code format that's much stricter than HTML. This can greatly simplify your markup code engine too, compared to making a selective HTML filter.


In any case, here's the data flow (in my wonderful ASCII-art)  ;-)  :

For input:
User input / source data
     \/
Database escaping function
     \/
Assemble database query string

For basic output:
Source data
     \/
HTML encoding algorithm [most likely nl2br(htmlspecialchars())]
     \/
user-agent (ie. site visitor's web browser)

For fancy output:
Source data
     \/
BBcode interpreter engine and HTML tag assembler <--------> HTML encoding algorithms
     \/
user-agent (ie. site visitor's web browser)



Follow these guidelines, and your scripts will be 100% binary-safe, secure from XSS attacks, immune to SQL injection attempts, and very user-friendly since users have the entire character set available to them without any constraints.




Chris Shiflett wrote:

Dotan Cohen wrote:
One note, I remove semicolons from the user input to thrart SQL
injection as they can be used to terminate an SQL query and are
very uncommon in regular speech. However, htmlspecialchars()
and htmlentities add semicolons when converting. Is this
dangerous, ie, can this be exploited?

If you ever use htmlentities() to escape data for SQL or
mysql_real_escape_string() to escape data for HTML, then yes, it is
dangerous. Escaping functions are context-dependent.

Hope that helps.

Chris


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux