Re: Identifying and removing ?line-return characters from MySQL search returns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Those characters are non-ASCII.

They could be any of the following:

1.
Microsoft Word (and others) use non-ASCII characters for all kinds of
fun things like "curly quotes" and "em-dash" and "ellipsis" and so on.
http://php.net/htmlentities
has several User Contributed notes to deal with this

2.
Non-English characters.  They could easily be Spanish, French, German,
etc.

The best way to deal with this is to make EVERYTHING be UTF-8 from
beginning to end.

Your web page FORM, and all the output pages must use:
<?php header("Content-type: text/html; charset=\"UTF-8\"");
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Note that MS IE is badly-broken (shocking, I know) and IGNORES the
header and uses a calculation of the number of characters in/out of
bounds for any given charset to "guess" as the charset for you page.
For reasons known only to MS, they do honor the META tag above, however.

Your MySQL server *and* your MySQL client (possibly bundled with PHP)
must use UTF-8.

You will also need to CONVERT all the existing data in your database,
which may be a jumble of Latin1 and UTF-8 and god-knows-what.  Here's
a great blog post from somebody who went through this:
http://www.oreillynet.com/pub/wlg/9022?wlg=yes
Hopefully his techniques will be useful to you as well.


On Wed, February 1, 2006 4:24 pm, Dougal Watson wrote:
> Hi Everyone,
>
> My problem is not related to PHP itself, but I hope my solution is.
>
> I run a website with much of the material being fed from several MySQL
> databases. Some of the material is fed into the databases through a
> PHP
> mailbot and some directly uploaded from a FileMaker database on a
> desktop Mac.
>
> My problem is that when viewed using Safari on my Mac the search
> results
> show a lot of "unknown character" symbols (squares with diagonal
> crosses
> over them). This does not occur when MS Explorer is used. My
> workaround
> has been to convert the output to html and rely on the fact that the
> vast majority of site visitors use MS Explorer ... But I would really
> like to get rid of those characters.
>
> I had pretreated the text to replace ASCII 10 characters before
> uploading to MySQL (or so I thought) but I suspect my problem still
> relates to either ASCII 10, ASCII 13 or some other control character
> that is represented differently on the Mac and Windows browsers. It
> may
> be that the FileMaker text download puts ASCII 10 or something else
> back
> into the text ... this is not a problem for the content that is fed
> directly into the database by the PHP mailbot.
>
> This leads me to my two questions:
> 1.	How can I determine what that invisible character is ... I'm
> guessing ASCII 10 (which I thought I'd removed) but don't know how to
> check;
> 2.	How can I adjust my PHP code to 'treat' the MySQL output to
> remove that character on its way to the browser? I already use eregi
> and
> other string functions to cloak email addresses and otherwise clean-up
> the output ... but don't know how to find a (probably) control
> character
> such as this.
>
> I'm sorry if this is a very rudimentray, PHP 101, query ... but I am a
> pretty rudimentary coder.
>
> Cheers
> Dougal
>
>
> An example can be seen at (copy-paste works but clicking on the link
> doesn't seem to):
> http://aeromedical.org/List/archive_aeromed-list/archive_search_aeromed-
> list.php?form=yes&first_hit_on_page=0&query=%20SELECT%20datetime,%20auth
> or_email,%20title,%20body,%20author.person%20AS%20author_nameFROM%20`pos
> tings`,%20`author`%20%20WHERE%20(((body%20REGEXP%20'e')%20OR%20(title%20
> REGEXP%20'e'))%20AND%20((YEAR(datetime)%20>=%20'1995')%20AND%20(YEAR(dat
> etime)%20<=%20'3000'))%20AND%20(postings.author_email%20=%20author.email
> )%20)%20ORDER%20BY%20datetime%20DESC%20%20LIMIT%200,%2010&count=963&sear
> chstring01=e&choice3=and&searchstring02=&choice4=and&searchstring03=&sta
> rtyear=1995&endyear=3000&show_full_posting=2006-01-24%2011:39:26 ...
> but
> it will look fine if you're using MS Explorer.
>
>
>
> This email and its accompanying attachments is intended for the named
> recipient only and may contain information that
> is confidential and subject to legal privilege. If you are not the
> intended recipient please inform the sender and destroy
> the message. If you have received this message in error you must not
> distribute or copy this email or its attachments.
>
> The Civil Aviation Authority accepts no responsibility for any changes
> made to this message after transmission from the
> Civil Aviation Authority. Before opening or using attachments, check
> them for viruses and other effects.
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux