Re: utf-8 ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/30/09 6:15 PM, "Reese" <howell.r@xxxxxxxxxxxxxxxx> wrote:

> Tom Worster wrote:
> 
>> why use SGML character entity references in a utf-8 file or stream? can't
>> you just put the character in the file?
> 
> Because, I thought, HTML files were basically just text files with
> different file extensions, and that those other characters would not
> store or display properly if saved in .txt format. Was I mistaken?

yes. see http://www.w3.org/TR/html401/charset.html

which says that html uses the UCS, a character-by-character equivalent to
the Unicode character set. so if you use a unicode character encoding (such
as utf-8) then you you have a direct encoding for every unicode character in
html.

so a utf-8 html file or stream should normally to have no entities other
than &lt;, &gt;, $amp; and perhaps &quot; as needed.

texts file may be utf-8 encoded too. xml, json and csv files are more
examples of text files that can be utf-8 encoded and use any unicode
character simply and directly. windows and os-x have used utf-8 as their
default text file encoding for many years now.


> http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt
> 
> That is supposed to be a UTF-8 encoded text file, between 1/3 and 1/2
> of the characters do not display correctly on my screen.

this page looks great to me. i don't appear to have the runes or amharic
fonts on my computer so those aren't showing but everything else works.

why this doesn't work for you is not clear. it could be that your browser
has a preference configured to override the charset specified in the http
headers. or perhaps the browser does not observe the specified content type
for txt files.


> Either way,
> this next link suggests that Turkish characters with no equivalent in
> the English language should be encoded for Web display:
> 
> http://webdesign.about.com/od/localization/l/blhtmlcodes-tr.htm

don't believe everything you read on the web. while some browsers may
tolerate it, i don't think pages encoded according to those suggestions
would even be valid html.


> And because that is off-topic, I'll throw this in:
> 
> The consensus seems to be that the proposed "ifset()" and "ifempty()"
> functions are more effort than they are worth. What I'd like to know
> is, why "empty()" still exists when every time I turn around, the
> mentors I turn to locally tell me not to use it, to use "isset()"
> instead. Because empty() doesn't work with zero. Anyone care to take
> a stab at that?

perhaps because it's hard to get rid of language elements without breaking
existing code?



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux