On Thu, January 3, 2008 11:39 am, tedd wrote: > At 4:24 PM +0100 1/3/08, Nisse Engström wrote: >>On Wed, 2 Jan 2008 19:36:56 -0500, tedd wrote: >> >>> To find out, I did put the operation through FireFox and reversed >>> the >>> POST/GET operations to get a look at the string -- it is: >>> >>> %C2%A0%C2%A0%C2%A0Z%C2%A0%C2%A0%C2%A0 < where Z is the value >>> passed. >>> >>> Now, C2 (HEX) is a linefeed (194 DEC) >>> >>> And, A0 (HEX) is a non-breaking space (160 DEC;) which is a >> >>Not quite. <A0> is non-breaking space in *some* character >>encodings, such as the ISO-8859-... encodings. It may >>be different in other encodings. In UTF-8, it is <C2 A0>, >>which is exactly what you're seing. > > Well considering that UTF-8 encompasses/includes all of the code > points found ISO-8859, then I think that both encodings would > reference the same character. After all, if they didn't then what's > the point of Unicode? > > Now, one can argue how many bytes are needed to represent a character > in what encoding, but that doesn't change the character. In the end, > I believe that <A0> is the same regardless of what charset or > encoding you're using. > > I just don't understand where C2 comes from or why it's there. I > would think that <00 A0> would be more appropriate. > >> > Therefore, if I simply use: >>> >>> $submit = str_replace( chr(194), '', $submit ); >>> $submit = str_replace( chr(160), '', $submit ); >>> >>> This is the solution. >> >>Hardly. > > If you mean my solution doesn't work, then you are mistaken -- for > works for me. > > >> > Now, why does a POST operation add in C2's? I'll leave that for >>> another post. :-) >> >>I haven't had time to look at the code, but perhaps you >>need to specify a character encoding for the page. > > > That's a valid point. Not only the encoding that's declared for the > page via it's html DOCTYPE, but also what encoding was used to > actually save that file on the server. > > This entire encoding process is more involved than it looks, or so it > appears to me. Perhaps you should be taking a whitelist approach to filtering input?... :-) In other words, only allow specific characters combinations you expect to see, and ignore any other goofy characters that were encoded from Or, possibly, try using just spaces and not for the value -- I suspect that the browsers will NOT collapse the spaces in the VALUE since it's data, not HTML content... -- Some people have a "gift" link here. Know what I want? I want you to buy a CD from some indie artist. http://cdbaby.com/from/lynch Yeah, I get a buck. So? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php