RE: Convert UTF-8 to PHP defines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Ashley Sheridan

>On Thu, 2010-05-27 at 12:08 -0400, Adam Richardson wrote:
>
>> On Thu, May 27, 2010 at 9:45 AM, Guus Ellenkamp
>> <Ellenkamp_Guus@xxxxxxxxxxx>wrote:
>> 
>> > Thanks, but are you sure of that? I did some research a while ago and found
>> > that officially PHP files should be ascii and not have any specific
>> > character encoding. I believe it will work anyhow (did not try this one),
>> > but would like to stick with the standards.
>> >
>> > "Ashley Sheridan" <ash@xxxxxxxxxxxxxxxxxxxx> wrote in message
>> > news:1274883714.2202.228.camel@xxxxxxxxxxxx
>> > > On Wed, 2010-05-26 at 22:20 +0800, Guus Ellenkamp wrote:
>> > >
>> > >> We use PHP defines for defining text in different languages. As far as I
>> > >> know PHP files are supposed to be ASCII, not UTF-8 or something like
>> > >> that.
>> > >> What I want to make is a conversion program that would convert a given
>> > >> UTF-8
>> > >> file with the format
>> > >>
>> > >> definetext1=this is a text in random UTF-8, probably arabic or similar
>> > >> text
>> > >> definetext2=this is another text in random UTF-8, probably arabic or
>> > >> similar
>> > >> text
>> > >>
>> > >> into a file with the following defines
>> > >>
>> > >>
>> > define('definetext1',chr(<t_value>).chr(<h_value>).chr(<i_value>)...
> <chr(<x_value>).chr(<t_value>));
>> > >>
>> > define('definetext2,chr(<t_value>).chr(<h_value>).chr(<i_value>)...
> <chr(<x_value>).chr(<t_value>));
>> > >>
>> > >> Not sure if I'm using the correct chr/ord function, but I hope the above
>> > >> is
>> > >> clear enough to make clear what I'm looking for. Basically the output
>> > >> file
>> > >> should be ascii and not contain any utf-8.
>> > >>
>> > >> Any advise? The html_special_chars did not seem to work for Vietnamese
>> > >> text
>> > >> I tried to convert, so something seems to get wrong with just reading an
>> > >> array of strings and converting the strings and putting them in defines.
>> > >
>> > > PHP files can contain utf-8, and in-fact is the preference of most
>> > > developers I know of.
>> > >
>> >
>> Because the lower range of UTF-8 matches the ascii character set
>> (intentionally by design), you'll be able to use UTF-8 for PHP files without
>> problem (i.e., ascii 7-bit chars have same encoding in UTF-8.)
>> http://www.cl.cam.ac.uk/~mgk25/unicode.html
>> 
>> However, if you were to use any of the multibyte characters of UTF-8 in a
>> PHP file, you could run in to some trouble.  I use UTF-8 for most of my PHP
>> files, but I've been sticking to the ASCII subset exclusively.
>
> I don't use the higher range of characters often, but I do sometimes use
> them for things like the graphical glyphs (½✉✆, etc) I know I could do
> those with regular text and the Wingdings font, but that's not available
> on every computer, and breaks the semantic meaning behind the glyphs.

What higher range? ASCII only defined 128 values, the bottom 32 being control characters that don't print. Anything outside of that is not ASCII, but a proprietary extension. In particular, the glyphs usually associated with 0-32 and 128-255 are IBM specific and not guaranteed to be present outside of their original video ROM. So only the first 128 characters map directly into UTF-8.

Bob McConnell

Ref: pp 25-29 The Programmer's PC Sourcebook, 1988, Thom Hogan, Microsoft Press



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux