Re: i18n maybe?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



UPDATE:

What's actually in the database is:

3â¢16

So in this case, it's not i18n, and my function is doing the correct
thing.

Why the band wants those characters in their band name is beyond my
ken, but you know those flaky musicians :-)

Apologies for my failure to detect this sooner -- I was having trouble
finding the actual record in the 150,000 in the first place.

I guess I should still be concerned about those international artists
who do place non-ASCII characters in their band names, so any insight
you want to provide is still most welcome and will not be wasted.

On Wed, December 14, 2005 6:26 pm, Richard Lynch wrote:
> I have a table like this:
> artist_id | artistname  | artistname_alpha
> 1         | The Doors   |
> 2         | The The     |
> 3         | 100 Monkeys |
> 4         | 3�16   |
>
> That last artistname is not in ASCII/English...  Dunno what your email
> client is showing you, but it's:
>
> the digit 3
> capital A with umlauts
> US cents sign
> capital A with carat
> question mark
> capital A with carat
> US cents sign
> the digit 1
> the digit 6
>
> THAT ought to get through any email client/mta okay. :-)
>
> Now, my goal is to fill in artistname_alpha with things such as:
> Doors, The
> The, The
> one hundred monkeys
> 3�16 (???)
>
> I've written a nifty function for this:
>
> function alpha ($string){
>   //$string = utf8_decode($string);
>
>   $string = preg_replace_callback('/(\\$[0-9\\.]+)/',
> create_function('$s', 'return
> Numbers_Words::toCurrency(str_replace("$", "", $s[1]));'), $string);
>   $string = preg_replace_callback('/([0-9]+)/', create_function('$s',
> 'return Numbers_Words::toWords($s[1]);'), $string);
>
>   if (stristr(substr($string, 0, 4), 'The ')) return (substr($string,
> 4) . ', ' . substr($string, 0, 4));
>   elseif (stristr(substr($string, 0, 3), 'An ')) return
> (substr($string, 3) . ', ' . substr($string, 0, 3));
>   elseif (stristr(substr($string, 0, 2), 'A ')) return
> (substr($string, 2) . ', ' . substr($string, 0, 2));
>   else return $string;
> }
>
> Now, the tricky part is that I don't really know what
> '3�16' is.
>
> It looks like it might be UTF-8, but utf8_decode() had no effect on
> it, which is why I've commented that out in the function.
>
> SO my function currently converts it to:
> 'three�sixteen'
>
> That ain't right.
>
> So, does anybody who understands this i18n stuff want to clue me in
> the right direction?...
>
> Things you should know:
>
> I'm not trying to provide support for anything but English here,
> unless it's trivial to do so.
>
> The table has 150,000 rows.
>
> I have no real control over fancy MySQL settings, as it's a $20 shared
> host deal.
>
> Every day, at 6 am, I get a new file of this data, and run through
> with a script that does an UPDATE or INSERT.  REPLACE is not suitable
> due to primary key field size of source data.  Anyway, I haven't even
> checked if the function as-is will be too slow, but whatever I do to
> fix the i18n issue can't have too much overhead, as it will be called
> 150,000 times every morning at 6 am.
>
> If it helps, here is what my data-source dumps out when he encounters
> this band name:
> http://cdbaby.com/cd/316live
>
> Here is the band's web-site:
> http://316live.com/
>
> And, here, possibly, is HTML source for what somebody copied/pasted
> into the FORM to fill in the band name:
>
> 3·16
>
> So, possibly, this is not i18n at all, and just somebody really really
> really silly copying and pasting an HTML entity 'middot' from their
> website into a form input and expecting it to render...
>
> Would '·' output by a browser turn into 'âÂ�¢' ???
>
> If so, what can I do about it?
>
> --
> Like Music?
> http://l-i-e.com/artists.htm
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux