Re: Effectiveness of pg_escape_string at blocking SQL injection attacks

Volkan YAZICI <volkan.yazici@xxxxxxxxx> · Sat, 28 May 2005 10:01:33 +0300

Hi,

On 5/27/05, Ed Finkler <coj@xxxxxxxxxxxxxxxxx> wrote:
> The php mysql api has a function "mysql_real_escape_string" that seems
> to be able to thwart known SQL injection attacks -- at least the ones of
> which I and other people I've discussed this with know.  I am curious to
> know if pg_escape_string is as effective.  If not, what would need to be
> modified to make it more effective?

I didn't hang around and began to trace mysql.com for a CVS
repository. After a five minutes of search, I found that MySQL uses
BitKeeper [1]. Now next thing to do is to find where
mysql_real_escape_string() [2] is defined in. And the result is:
mysys/charset.c [3]

[1] http://mysql.bkbits.net:8080/mysql-5.0/
[2] There're two functions defined in PHP for escaping MySQL statements:
    mysql_escape_string() and mysql_real_escape_string(). When I looked at the
    MySQL source code, I found that, both of 'em refers to
    escape_string_for_mysql() function. (As I saw, there's not any
escape routine
    specialized for binary data.)
[3] http://mysql.bkbits.net:8080/mysql-5.0/anno/mysys/charset.c@xxxxx

I found below replacements in escape_string_for_mysql() function:

/* {{{ Code snippet */

#ifdef USE_MB
/*
 If the next character appears to begin a multi-byte character, we
 escape that first byte of that apparent multi-byte character. (The
 character just looks like a multi-byte character -- if it were actually
 a multi-byte character, it would have been passed through in the test
 above.)
 Without this check, we can create a problem by converting an invalid
 multi-byte character into a valid one. For example, 0xbf27 is not
 a valid GBK character, but 0xbf5c is. (0x27 = ', 0x5c = \)
*/
if (use_mb_flag && (l= my_mbcharlen(charset_info, *from)) > 1)
{ 
  *to++= '\\';
  *to++= *from;
  continue;
}

/*
 [GBK: Encoding standard for Simplified Chinese, used in the People's
 Republic of China and in Singapore.]
*/
#endif

switch (*from) {
case 0: /* Must be escaped for 'mysql' */
 *to++= '\\';
 *to++= '0';
 break;
case '\n': /* Must be escaped for logs */
 *to++= '\\';
 *to++= 'n';
 break;
case '\r':
 *to++= '\\';
 *to++= 'r';
 break;
case '\\':
 *to++= '\\';
 *to++= '\\';
 break;
case '\'':
 *to++= '\\';
 *to++= '\'';
 break;
case '"': /* Better safe than sorry */
 *to++= '\\';
 *to++= '"';
 break;
case '\032': /* This gives problems on Win32 */
 *to++= '\\';
 *to++= 'Z';
 break;
default:
 *to++= *from;
}

/* }}} */

As I saw, MySQL follows different methods to escape input data. But
AFAIC, the method followed depends a little bit on the software
architecture, like handling literals from pointers. Thus, it shouldn't
be an objective comparision when we just look at "which one escapes
which char".

Anyway, I hope above escaping differences helps folks.

I also want to add something about using parameters instead of
escaping. There're lots of advantages of using parameters when
compared to escaping:
o. You just pass the value and don't need to worry about any
SQL-Injection attack.
o. When you try to escape a data, CPU usage increases with linear
proportional to data length. (Try escaping an MP3 file.)
o. As far as I understand from the length of this post's thread, you
can never be sure about the safety of any escaping method.

But as Ed Finkler underlined, most of the programmers prefer escaping
while sending SQL statements. So "parameter usage is more secure"
shouldn't be an answer for "are our escaping routines enough"
question.

Regards.