Re: XML and special characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Adam Hubscher wrote:
tedd wrote:

I've been having a tough time with parsing XML files and special characters.

-snip-

Any suggestions as to how I could get around this seemingly impossible road block thats been placed by what seems to be the xml engines :O..



Adam:

I believe that these "special" character will be with us for a long while. I suggest that you review the Unicode database for these characters and my suggestion is to use the code-points (HEX equivalences) for these characters. For example, 0061 is a small "a", 2022 is a "bullet", 2713 is a "check-mark" and so on. Most language glyphs of the world are represented in the Unicode database.

HTH's

tedd

Oh, I understand that they'll be here for a while.

The problem is the XML file is not my own, rather, its generated by another service that I am creating a stemmed service for. I feel I have asked much of the owner of that service in creating a properly formed XML file (he was simply using pseudo xml that was, although nice and organized, unable to be parsed.. period, and took forever with pregs, at least now running through an XML generator the script itself takes less time on his part too, and hes thankful for that.)

There are usernames listed in the file that use these special characters.

Rather than have him have to well, go through and edit the 30000 some odd users that are indexed... unless there is a way for the xml writer to do hex codes instead of unicode codes automatically... (and in that partake, is there any way to read them automatically with a parser?), then the idea is feasible.

Other than that, I'm trying to find a solution to parse the existing file with the unicode data that causes a fatal error in the parser.
ee dee da da da? &sect;&eth; <-- those that look like html entities are the represented characters. I was mistaken, they are html entities, which is even odder to me.

I apologize for earlier referring to utf8, they do not decode with utf8, they decode with html entities. however, i continue to try methods to get it to read... still it does not read properly.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux