Re: Smart Quotes not so smart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, November 15, 2006 8:36 pm, Larry Garfield wrote:
> Client has a large MS SQL database with lots of data.  Some of that
> data
> includes "smart quotes", aka curly quotes, but not real ones.  They're
> the MS
> Word character encoding standards?  What's that?" smart quotes.  On
> their old
> setup (SQL Server 2k, OpenLink ODBC driver, IIS, PHP 4.0.6), they
> actually
> worked just fine.  On our old devel setup (the same but with a
> different ODBC
> driver), it worked fine.
>
> On our new devel setup (SQL Server 2k, OpenTDS ODBC driver, Apache,
> PHP
> 5.1.6), it works fine.  On their new live setup, however, (same, but
> again
> not sure of the ODBC driver) they're getting the dreaded squares or
> question
> marks or accented characters that signify a garbled smart quote.  I
> know
> they're not unicode characters because Windows, the DB server, and the
> driver
> are all set to either UTF-8 or UTF-16.

You are correct.  They are not "real" UTF-8 nor UTF-16 characters.

> We've tried eliminating middle-men to no avail.  I've also tried doing
> a
> find-replace on the smart quote characters before they're inserted
> into the
> database, copying and pasting them from Word, and PHP skips right past
> them
> and enters them into the database.

When you copy/paste in Word, MS Word probably changes the quotes
automatically to "smart quotes" right after you change them to not
smart.

That's a feature of MS Word, that you have to explicitly turn off in
their interminible preference menus, if you can find it.  That's not a
real option, as your users won't find it, won't turn it off, and will
turn it back on, or have it turned back on when they follow the
telephone support instructions to just wipe out their entire OS and
re-install everything from scratch because of one dodgy driver.
:-)

Meanwhile, back at the ranch...

> All we're left with is MAYBE telling them to dry a different ODBC
> driver or
> else fixing the data by hand.  I don't like either option, myself.
> Does
> anyone have any better ideas to suggest?  Any idea what those smart
> quotes
> actually are, and if they exist in ANY valid character set other than
> Word
> itself?

Sure!  They exist in ALL MS products.  Word, Excel, PowerPoint, etc.

But they aren't valid anywhere else, of course, as they are a
proprietary character set that only MS uses.

That's the "extend" part of "embrace and extend" :-) :-) :-)

Some MS engineer was probably specifically tasked with the goal of
making a non-interopable charset, of course.  Or I suppose we could be
charitable and attribute to stupidity instead of malice and blame some
"Designer" somewhere in the bowels of Redmond. :-)

Catch the data somewhere in PHP and use the functions from the User
Contributed code on http://php.net/str_replace to replace the MS Word
chars with their ASCII or HTML equivalents -- Both versions are in the
User Contributed notes, plus variations on this theme.

You may have trouble with REAL UTF-8 and UTF-16 charsets, however, as
I suspect that MS Word smart quotes may "collide" with those charsets
(codepages?) in a way that makes one indistinguishable from the other.

I.e., telling smart quote in MS-Word from an Umlaut in UTF-8 may
require an A.I. heuristic process rather than a definitive solution. 
Sorry.

Do realize that I only half-understand this UTF-8 stuff (Okay, maybe
more like quarter-understand) so could be entirely wrong on the
"clash"

-- 
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some starving artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux