Hey Alex, take a look at this; At the top of each php page which interacts with a database, just have this one liner $DBH = safe_connection("database_name_here"); //$DBH stands for database handle obviously the safe_connection is not a built-in PHP function so we have to come up with it... The idea behind this "safe_connection" function is this; It takes the dbname, uses it in looking up to retrieve the database username, the password, the host name and the hostname, and the host type ( whether the host is mysql or mssql etc) - for the specified database. Then it uses all this data to establish a db connection and thus get the $DBHandle. Once the $DBHandle is obtained, then mysql_real_escape_string ( or the mysqli_real_escape_string version ) can be used.... (However, the mentioned mysql_real_escape_string function here would be the right choice **only if** the hosttype is mysql! ) So, that;s where we use the hosttype. Microsoft SQL may require a different escaping mechanism. Now, the question is where do we use this mysql_real_escape_string function? Well, on the usual suspects! the dirty 5 arrays; namely _GET, _POST, _COOKIE, _REQUEST and the _SERVER. Yes, the _SERVER too. ( that's due to the http_referer, remote_addr etc spoofing ). Here is a basic example handling the _GET array! foreach ($_GET as $k => $v) { $_GET[$k] = mysql_real_escape_string($v); // this is good if host type is mysql... } So, the basic idea is to clean up the entire GET array and be safe and thorough. And do this across all global arrays where a user input can possible come from. So, with this one liner function, called right at the beginning of your script, you not only get a DBHandle to do your queries but also get the assurance that the userinput is safe so you can get into busines instantly as follows; $safe_firstname = $_GET['firstname']; How easy is that! (To keep the basic idea short, I did not get into the magic_quotes_gpc and stripslashes() matter. But I assume people reading this message know whey are and how they get used. So, if you just focus on the basic idea, what do you say? ARE WE STILL NOT OK? Do we still need PDO? My answer to this question is ABSOLUTELY NO. But this NO is as far as the SQLInjection woes. PDO may offer other advantages warranting its use but as far as the SQLInjection is concerned and when we know that the data has been thoroughly escaped like this, using PDO will not make it any safer. Absolutely NOT. Do we all agree on that? It's a plain YES or NO question right here. As far as the C. Shifflet's article and Ilia's follow up post ( http://ilia.ws/archives/103-mysql_real_escape_string-versus-Prepared-Statements.html ) is concerned, the only thing we need to worry about is whether we are working with GBK character code, Chinese character set that is. If we got nothing to do with GBK char set, then the technique I covered above will suffice and cover us safely, conveniently and effortlessly. But if you do work with GBK and you do that in your script by actually running this ( mysql_query("SET CHARACTER SET 'gbk'", $c); ), then the above technique will doom you. Then PDO is your only bet, but otherwise, we are OK. As far as the escaping, I know you were against that. Here is what you said about the escaping. <quote> Escaping is a bad and many times failed attempt at saying that it's ok to pass user input as code, we just escape the characters we think are bad, to tell the interpreter not to execute them as it normally would. But what does it mean to be a character? Well back when all these languages were designed there was ASCII, and life was easy, now, however we have utf7, utf8, utf16, with tens of thousands of characters, many of which are the same symbol. Oh and they morph, if you don't know what best-fit matching is, look it up, but at the end of the day, if you think that you know what characters you need to escape, you are wrong, i'm sorry. This is why in javascript there are 3 escape functions: escape, escapeURI and escapeURIComponent. Which roughly translate to "we failed", "we failed again" and "we failed the third time". </quote> Correct me if I'm wrong, IF WE ARE NOT CHANGING THE CHAR SET WITHIN THE SCRIPT USING SOMETHING LIKE "SET CHARACTER SET", THEN WHAT"S WRONG WITH ESCAPING WITH THE RECOMMENDED UTILITY FOR THE SQL HOST AT HAND? From Ilia's post, I understand that even if the character set was to be GBK at server's config level, we would be fine. Here is the excerpt from that very section. <quote> In MySQL there are two ways to change the character set, you can do it by changing in MySQL configuration file (my.cnf) by doing: CODE: [client] default-character-set=GBK Or you can change on a per-connection basis, which is a common practice done by people without admin level access to the server via the following query: CODE: SET CHARACTER SET 'GBK' The problem with the latter, is that while it most certainly modified the charset it didn’t let the escaping facilities know about it. Which means that mysql_real_escape_string() still works on the basis of the default charset, which if set to latin1 (common default) will make the function work in a manner identical to addslashes() for our purposes. Another word, 0xbf27 will be converted to 0xbf5c27 facilitating the SQL injection. </quote> Which means, mysql_real_escape_string is aware of things! On Wed, Jan 18, 2012 at 2:36 PM, Alex Nikitin <niksoft@xxxxxxxxx> wrote: > There were a few questions, so i will reply in line. > > On Tue, Jan 17, 2012 at 7:22 PM, Haluk Karamete <halukkaramete@xxxxxxxxx> wrote: >> Thanks Alex. I re-read Ross's reply but I don't think you and him ( >> and even me ) is saying something different. It's all about validating >> the input. In my dictionary of programming vocabulary, validation is >> not just validating user input but also sanitizing it. > > Validating and cleaning user input is important, i'll explain what > user-input is later, however the issue i am trying to describe is > language interoperability and how escaping fails to solve that issue, > no matter how hard we try it's a bad solution because it will always > fail to differentiate between user and programmer-supplied inputs. > That is no matter how many slashes i put in front of a "'" <- tick or > a """ <-quote, when sql interpreter reads "select * from foo where > a=\\\\"b\\\\"" it will not know that "b" is a user-supplied input and > "select * from foo where a=" is a programmer supplied one. That is the > problem i am drawing attention to... > >> You summed it up nicely by saying programmer's string vs user's >> string. It's the user's string where we want to focus on... If we go >> to the source of the problem that is "receiving user input", that's >> where we start. >> >> So, every single user input ( not just query-string and form-post data >> or cookies but even the server variables such as the *http_referer" >> must be considered as potential threats. If we are to store the user >> ip based on the server var http_referer value - without >> validating/sanitizing it - then we are really risking things! I Aren't >> we? So we should not think that hey that's a $_SERVER variable so why >> worry about it... I wanted to bring this matter up as a side >> parenthesis cause I don't think http_referer would be on many >> programmers' list of things to be watched out and to be sanitized! >> >> So first things first, we ned to identify all possible ways that a >> user input can take place. and that's not limited to >> querystings/form-posts and cookies. Server vars are in my list too, >> not all of them but those that have issues with. If you guys can think >> of some other types ( like the server variables that one would not >> think from the userland ) that I may be missing, I'd appreciate your >> filling in here. > > Let me define what user-input is, in a web application, user input is > anything (and i repeat anything) that comes in on the wire, $_SERVER > and $_COOKIE and all of these vars get populated based on that, even > if it's an ajax call back or something, i can always write my side of > what you think is your client to send you anything anywhere, including > REMOTE_ADDR, whatever. Granted it's typically web-servers task to keep > that junk out of your server vars, nevertheless, i can not tell you > that they do, or do so correctly, or if you write your own server... i > digress. > >> Once we know what possible places are out there that a user string can >> make it, we apply our function right there and then. So, yes, we need >> a library function that does this for us for once and for all. This >> library function must deal with data validation (on data length and on >> data type ) as its first line of defense. > > There is no magic method that can take everyone's use case and say > "this is valid input" and "this is not". This is because what is valid > for you, is not valid for me, and the other way around, especially > with type-agnostic language, such as PHP. The only feasible way to do > this would be to build a form class that will auto-validate the input > based on the typedefs you give it for HTML for example, however that, > from practice makes the class fairly rigid, and cumbersome to use. The > other approach to this would be to build a class, that will check user > input for predefined patterns, patterns you can quickly recall, or > pass as a regex. One can then use those to create forms with more > advanced validation, but still allowing you to write fairly flexible > html, js and css around it. Done that too, however it still forces me > to write checks, and it still fails to be dead simple, a requirement > to make programmers write secure code. > >> In that function, we should also do the escape! And that bring us to >> the 2nd line of defense. >> >> In this 2nd line of defense, we must take care of the char code >> specific issues. And for that it looks like mysql_real_escape_string() >> is my only line of defense. That's per this article >> http://shiflett.org/blog/2006/jan/addslashes-versus-mysql-real-escape-string >> But it looks like, Alex you'd have a problem with that. Your reply >> sounded like that. Correct me if I got you wrong. If I have not >> misunderstood you, you have an issue with mysql_real_escape... But >> then could you tell me how you would deal with the GPK and GPK like >> matters in the light of Shiflett's article? > > Again, escaping fails as a solution. I don't need to escape your input > if i tell sql that "hey, this is user input, don't execute it", and > with prepared statements i can do just that. However nobody wants to > write prepared statements, because they are as painful to write as > they are to read. And when you have an application of any moderate > complexity, with databases of moderate complexity, prepared statements > quickly migrate from migraines to a core of their own issues and > tickets. > > This was shown quite well by Mike Samuel of Google here: > http://google-caja.googlecode.com/svn/changes/mikesamuel/string-interpolation-29-Jan-2008/trunk/src/js/com/google/caja/interp/index.html > To save some time i assume you will follow the link and read the very > insightful article. > > However the above is not to say "let's just escape them then", no, the > above is to say that we need better ways of doing prepared statements, > without the pain of thereof. > > So some time ago i wrote a class (which i have been meaning to > rewrite) to see if i could make inline interpolation work well in php. > The code, which i didnt have a lot of time to execute (this was post > my 2-weeks notice at my last job, and i had a reporting framework to > write to make it pretty and neat, work, and easy(ish) to write reports > for), but basically it used reflection and a small parser to allow you > to run queries with inline interpolation (and a new syntax), for > example you could write "select * from foo where a=^^b" where b was a > declared variable, and the class would first make sure you were not > including strings, and then use prepared statement in the back end to > run your query. That seemed to work better than the alternative, > though came with issues of it's own, one of which was the need to keep > the code short and sweet to have any hope of any sort of decent > performance, well and simple lack of time to write it. > > At the end of the day, without in-language ability to do this, > prepared statements are a known problem and people will be as > discouraged as they have ever been to use them because of that. > > As far as i understand, PDO is supposed to help mitigate some of these > issue with traditional prepared statements, i wrote a class to deal > with that too (which the inline interpolation stuff was rolled into), > however it still doesn't fix the "prepared statement" aspect of the > problem, and i can still run a typical query, which is easier and > faster to write, and as we all know, most programmers are all for less > code. > >> And after that if the function is still returning OK, then I take that >> user input, use it in a PDO based implementation. >> >> Am I missing anything? Feel free to fill it in please so I can close >> this chapter and move on. > > Even with PDO, you still need to pass your variables in as variables, > and code as code, that negates the need to escape your variables for > MySQL. Interestingly there is still no such solution, even a broken > one, for the UI bits (html, css, js, etc) you are still at the mercy > of what is essentially "escaping". Suggestions on that front are > certainly welcome... > > One last note on more reasons why escaping fails. Another major reason > why escaping fails is because it fails open. Consider this mock up > code: > $var = $mysqli->real_escape_string($user_input); > ... > $mysqli->query("select * from foo where a=$var"); > > Now consider ... to be many hundreds of lines of code. And you are a > coder, and you have a ticket with your select failing sometimes. > > First course of action: $var = $user_input; > //$mysqli->real_escape_string($user_input); > > 2 hours later, when you figure out what the actual problem is, you > fail to return that escape, people forget. But you know what, > everything works, because in security terms, it failed open, or in > short it failed! > > Now consider this: you have a prepared statement. When you are not > escaping, you will not take out your binding as a first measure, and > if you do, your query will just fail. Same thing if you b64 your > variables (which is how you do it inline with js, and sql that has a > b64 function (MsSQL for example)). > > I should mention that b64 is a wonderous solution to this issue. You > can just encode your variables in b64. > > This is really cool, because assume this > > $var=b64en($user_input); > .... > query("select * from foo where b64de(a)=b64de($var)"); > Or just drop b64de in the query. > and this works fine and to prevent display attack. > > echo "hello <script>document.write(b64d('$var'));</script>"; > > And it's ok that if you drop that initial encode this code is capable > of being injected, if you drop that b64en, you will fail at the > database, and fail all the time. This is known as failing closed, > meaning that as a programmer it is too much work to undo the security, > and even if you undo it for yourself, assuming average code reuse, you > will have broken everyone else's code, otherwise known as "getting > fired"... > > Ok, crazy things to do, so i gotta run > > > ~ Alex -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php