On Tue, 2008-11-25 at 17:09 +0000, ceo@xxxxxxxxx wrote: > I already had a function to go from weird MS-Word characters to HTML Entities, which I was putting into the DB as such. > > > > In retrospect, that function should have been called at output... Actually, I knew it should have, but convincing my co-workers was the proverbial brick wall, so I cheated and did it on data import and now I'm paying for it... > > > > I ended up just copy-pasting the entity/number table from here: > > http://www.w3schools.com/tags/ref_entities.asp > > and go through a 2-step process: > > > > RAW DATA (pasted from Word in unknown code-page/charset/encoding) > > HTML Name Entities > > HTML Numeric Entities > > > > This seemed to make the W3.org RSS validator "happy" > > > > It still looks goofy in the browser, but that's the problem of the users putting in this goofy stuff in the first place, so I'm shoving it back into their laps. > > > > My RSS feed validates, and the content within it not being "right" is the problem of the content creators. :-) > > > > [soapbox "on"] > > I'm pretty tired of dealing with this charset/codepage stuff, personally, after years of frustrating experiences, none ending in a real "solution" > > > > If anybody has a petition to abolish everything except for UTF-32, sign me up! :-v > > > > UTF-32 is the biggest, right? The one that has ALL characters anybody needs?... > > > > Hey, I don't care, UTF-64 or UTF-128 is fine by me too. Disk space is cheap. > > > > Just stop the insanity of endless incompatible irreversible calculations to substitute a bunch of numeric codes for characters, and make it socially unacceptable to use anything other than the one true encoding. > > > > I'm sure somebody somewhere actually enjoys dealing with this [bleep], but I'm betting the majority are quite tired of it. > > [/soapbox] > > > I came across a similar problem using an AJAX thing, with MSWord characters in the text. The way round the problem was to enclose everything inside CDATA blocks, which made the browsers happy to receive as the entities only had to be understood by the HTML browser now, not the XML parser. As RSS is an XML format, maybe this would help you? Ash www.ashleysheridan.co.uk