On 24/04/07, Justin Frim <jfrim@xxxxxxxxxxx> wrote:
Just my two cents worth... Magic quotes are the work of the devil. It's a shame that so many PHP installations have them enabled, and a huge disappointment that PHP is actually distributed with this stuff enabled! The mere fact that a script can't change this setting creates a real hassle and is my major gripe about the whole situation. I've *always* followed the programming practice of "work with your data unencoded, then encode it appropriately only at the last final output stage". That way you always know exactly what you're working with, no surprises, where each character is always 1 byte, regardless of what character it is. Here's a typical block of code which I include in the start of nearly all my PHP scripts: <?php //Do not delete this function! (unless you don't mind data corruption with PHP's default settings) function stripslashes_deep($value) { return is_array($value) ? array_map('stripslashes_deep', $value) : stripslashes($value); } //Get rid of those stupid damn annoying asanine magic quotes which just garble up your data. if (get_magic_quotes_gpc()) { /* (unfortunately in PHP these are enabled by default. AHH! Which idiot thought this was a good idea to turn them on by default? Good programming practise is to manually encode only the data that requires encoding just
You've got a typo in practice.
just before dumping it to places which need it (ie. databases), not automatically screwing up the entire collection of the system's variables! AHH!) */ $GLOBALS['HTTP_POST_VARS'] = stripslashes_deep($GLOBALS['HTTP_POST_VARS']); $GLOBALS['_POST'] = stripslashes_deep($GLOBALS['_POST']); $GLOBALS['HTTP_GET_VARS'] = stripslashes_deep($GLOBALS['HTTP_GET_VARS']); $GLOBALS['_GET'] = stripslashes_deep($GLOBALS['_GET']); $GLOBALS['HTTP_COOKIE_VARS'] = stripslashes_deep($GLOBALS['HTTP_COOKIE_VARS']); $GLOBALS['_COOKIE'] = stripslashes_deep($GLOBALS['_COOKIE']); $GLOBALS['HTTP_SERVER_VARS'] = stripslashes_deep($GLOBALS['HTTP_SERVER_VARS']); $GLOBALS['_SERVER'] = stripslashes_deep($GLOBALS['_SERVER']); $GLOBALS['HTTP_ENV_VARS'] = stripslashes_deep($GLOBALS['HTTP_ENV_VARS']); $GLOBALS['_ENV'] = stripslashes_deep($GLOBALS['_ENV']); $GLOBALS['HTTP_POST_FILES'] = stripslashes_deep($GLOBALS['HTTP_POST_FILES']); $GLOBALS['_FILES'] = stripslashes_deep($GLOBALS['_FILES']); $GLOBALS['_REQUEST'] = stripslashes_deep($GLOBALS['_REQUEST']); } set_magic_quotes_runtime (0); //Fortunately these can be killed with a single statement, unlike magic_quotes_gpc ?>
That's bad. For a function that was meant to make life easier, magic quotes sure has caused a bit of problems. I believe that it will be not available in php6.
Eh, don't mind the comments. Sometimes PHP programming can become quite frustrating. ;-) On to the next stage... encoding data for output to an HTML document. Personally, I prefer using htmlspecialchars() over htmlentities(), because it only converts those characters that *must* be converted for HTML ( & < > " ). There's no use in turning your other 1-byte characters into 5, 6, or 7-byte strings, if you already provided the correct character set in the Content-Type HTTP header (as you should!). Actually, if you want to get really picky, I usually use the following conversions: For most tag parameters: htmlspecialchars($tagdata) For display text: nl2br(htmlspecialchars($displaytext)) (This keeps newline sequences in effect.) For text which may contain a few control characters, special characters, or other binary data (sometimes useful in hidden form fields, or for special accented characters and non-english languages): preg_replace('/([\\x00-\\x1F\\x7F-\\xFF])/e','"&#".ord(substr("$1",-1)).";"',htmlspecialchars($binarytext)) (This encodes the data in a mostly still human-readable form, entirely with 7-bit ASCII characters only.) For binary data (sometimes useful in hidden form fields): strtr(base64_encode($binarydata),'+/=','-_.'); (All the advantages of Base64 encoding, without incurring any overhead from URL encoding when the form is submitted.) Anyhow, back on track to the original topic of this thread. For anything that gets written to a database or used for a query, I suggest escaping the data using a function specifically designed for that database. (And there are many different functions for the many different popular databases.) This should have *nothing* to do with blocking XSS, turning < into <, etc. Preparing for the database query string is no place to do the data conversion which will be necessary for the final output.
I took chris's advice and filter for XSS after the info is retrieved from the database, before sending it to the browser.
The last topic... blocking XSS attacks. If you use the encoding routines I listed above for outputting to HTML documents, you're already safe. And you're not outlawing any characters either... if someone wants to type < and >, or show semi-colons or whatever, they can, knowing with certainty that what they type is exactly what others will see. If you need to let users enter some mark-up, do what message boards and web log sites have been doing for years: BBcode. Then you can write your own routines to provide only the features you need, using a code format that's much stricter than HTML. This can greatly simplify your markup code engine too, compared to making a selective HTML filter. In any case, here's the data flow (in my wonderful ASCII-art) ;-) : For input: User input / source data \/ Database escaping function \/ Assemble database query string For basic output: Source data \/ HTML encoding algorithm [most likely nl2br(htmlspecialchars())] \/ user-agent (ie. site visitor's web browser) For fancy output: Source data \/ BBcode interpreter engine and HTML tag assembler <--------> HTML encoding algorithms \/ user-agent (ie. site visitor's web browser) Follow these guidelines, and your scripts will be 100% binary-safe, secure from XSS attacks, immune to SQL injection attempts, and very user-friendly since users have the entire character set available to them without any constraints.
Thanks. Most of that has already been done now, but I'll certainly keep your functions handy. I'll likely need them at some point. Dotan Cohen http://dotancohen.com/howto/firefox_password_manager.php http://lyricslist.com/lyrics/artist_albums/228/gordon_nina.html -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php