RE: Convert UTF-8 to PHP defines

Ashley Sheridan <ash@xxxxxxxxxxxxxxxxxxxx> · Thu, 27 May 2010 19:11:21 +0100

On Thu, 2010-05-27 at 14:06 -0400, Bob McConnell wrote:

> From: Ashley Sheridan
> 
> >On Thu, 2010-05-27 at 12:08 -0400, Adam Richardson wrote:
> >
> >> On Thu, May 27, 2010 at 9:45 AM, Guus Ellenkamp
> >> <Ellenkamp_Guus@xxxxxxxxxxx>wrote:
> >> 
> >> > Thanks, but are you sure of that? I did some research a while ago and found
> >> > that officially PHP files should be ascii and not have any specific
> >> > character encoding. I believe it will work anyhow (did not try this one),
> >> > but would like to stick with the standards.
> >> >
> >> > "Ashley Sheridan" <ash@xxxxxxxxxxxxxxxxxxxx> wrote in message
> >> > news:1274883714.2202.228.camel@xxxxxxxxxxxx
> >> > > On Wed, 2010-05-26 at 22:20 +0800, Guus Ellenkamp wrote:
> >> > >
> >> > >> We use PHP defines for defining text in different languages. As far as I
> >> > >> know PHP files are supposed to be ASCII, not UTF-8 or something like
> >> > >> that.
> >> > >> What I want to make is a conversion program that would convert a given
> >> > >> UTF-8
> >> > >> file with the format
> >> > >>
> >> > >> definetext1=this is a text in random UTF-8, probably arabic or similar
> >> > >> text
> >> > >> definetext2=this is another text in random UTF-8, probably arabic or
> >> > >> similar
> >> > >> text
> >> > >>
> >> > >> into a file with the following defines
> >> > >>
> >> > >>
> >> > define('definetext1',chr(<t_value>).chr(<h_value>).chr(<i_value>)...
> > <chr(<x_value>).chr(<t_value>));
> >> > >>
> >> > define('definetext2,chr(<t_value>).chr(<h_value>).chr(<i_value>)...
> > <chr(<x_value>).chr(<t_value>));
> >> > >>
> >> > >> Not sure if I'm using the correct chr/ord function, but I hope the above
> >> > >> is
> >> > >> clear enough to make clear what I'm looking for. Basically the output
> >> > >> file
> >> > >> should be ascii and not contain any utf-8.
> >> > >>
> >> > >> Any advise? The html_special_chars did not seem to work for Vietnamese
> >> > >> text
> >> > >> I tried to convert, so something seems to get wrong with just reading an
> >> > >> array of strings and converting the strings and putting them in defines.
> >> > >
> >> > > PHP files can contain utf-8, and in-fact is the preference of most
> >> > > developers I know of.
> >> > >
> >> >
> >> Because the lower range of UTF-8 matches the ascii character set
> >> (intentionally by design), you'll be able to use UTF-8 for PHP files without
> >> problem (i.e., ascii 7-bit chars have same encoding in UTF-8.)
> >> http://www.cl.cam.ac.uk/~mgk25/unicode.html
> >> 
> >> However, if you were to use any of the multibyte characters of UTF-8 in a
> >> PHP file, you could run in to some trouble.  I use UTF-8 for most of my PHP
> >> files, but I've been sticking to the ASCII subset exclusively.
> >
> > I don't use the higher range of characters often, but I do sometimes use
> > them for things like the graphical glyphs (½✉✆, etc) I know I could do
> > those with regular text and the Wingdings font, but that's not available
> > on every computer, and breaks the semantic meaning behind the glyphs.
> 
> What higher range? ASCII only defined 128 values, the bottom 32 being control characters that don't print. Anything outside of that is not ASCII, but a proprietary extension. In particular, the glyphs usually associated with 0-32 and 128-255 are IBM specific and not guaranteed to be present outside of their original video ROM. So only the first 128 characters map directly into UTF-8.
> 
> Bob McConnell
> 
> Ref: pp 25-29 The Programmer's PC Sourcebook, 1988, Thom Hogan, Microsoft Press

The higher range of utf8 characters that don't map to ascii values.

Thanks,
Ash
http://www.ashleysheridan.co.uk