C Drozdowski wrote: > I have been doing some testing and need confirmation that the following > is correct. > > You have a DOMDocument that potentially contains UTF-8 encoded data (it > might not however). > > You want to search it via DOMXpath->query() using a value that comes > from a $_POST value. > > If the page that posts the data via a form to the search script IS NOT > encoded in UTF-8, then the value must be converted to UTF-8 before it is > used in the query expression. > > Else, if the posting page IS UTF-8 encoded, then the $_POST data does > not need to be converted before being used in the expression. > > Is this correct? AFAIK... yes, this is correct. > > Also, if the $_POST data comes from a UTF-8 encoded page, and it needs > to be sanitized before use, will the basic PHP string functions work on > the data (e.g. htmlentities, stripslashes, trim, preg_replace, etc)? > > If not what do I have to do? I believe that PHP uses ISO-8859-1 as the default encoding, but there are ways around it. htmlentities() will let you specify UTF-8 encoding. Remember that your DOMDocument may / may not be whitespace-sensitive, so be careful about how / if you trim(). I don't know how well stripslashes, preg_replace, etc. work with UTF-8. Hopefully someone else will be able to help out with those... -- Teach a man to fish... NEW? | http://www.catb.org/~esr/faqs/smart-questions.html STFA | http://marc.theaimsgroup.com/?l=php-general&w=2 STFM | http://php.net/manual/en/index.php STFW | http://www.google.com/search?q=php LAZY | http://mycroft.mozdev.org/download.html?name=PHP&submitform=Find+search+plugins
Attachment:
signature.asc
Description: OpenPGP digital signature