Localization: problems with toupper/tolower transformations of latin characters.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi all,

I was trying to get propper transformation for upper/lower case
characters in a generic way, at first in C++, but I noticed that the
libc is apparently not converting correctly the latin characters.

Running the attached code, that tries the C++ and C functions -- later I
found out that libstdc++ uses libc for these, I get:

$ ./test
C++ version:
Órfão       (original string)
ÓRFãO   (should be upper case string)
Órfão    (should be lower case string)

C version:
Órfão
ÓRFãO
Órfão


Any ideas about what I could be missing ? Or is the library missing ?

(I tried different locales, as commented in the code, and using UTF-8
for encoding)


thanks in advance for any help/pointers!

- jan

ps.:
$ gcc --version
gcc (GCC) 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu9)
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



#include <iostream>
#include <iterator>    // for back_inserter
#include <locale>
#include <string>
#include <algorithm>
#include <cctype>      // old <ctype.h>

using namespace std;

struct MyToUpper
{
	MyToUpper(std::locale const& l) : loc(l) {;}
	char operator() (char c) const  { return std::toupper(c,loc); }
	private:
		std::locale const& loc;
};
   
struct MyToLower
{
	MyToLower(std::locale const& l) : loc(l) {;}
	char operator() (char c) const  { return std::tolower(c,loc); }
	private:
		std::locale const& loc;
};


int main(int argc, char** argv)
{
	// locale loc( "" );
	// locale loc( "en_US.UTF-8" );
	locale loc( "pt_BR.UTF-8" );
	MyToUpper up( loc );
	MyToLower down( loc );

	cout << "C++ version: " << endl;
	const char *reference = "�rfão";
	string normal = reference; 
	cout << normal << endl;
	transform( normal.begin(), normal.end(), normal.begin(), up );
	cout << normal << endl;
	transform( normal.begin(), normal.end(), normal.begin(), down );
	cout << normal << endl;


	// C version
	//setlocale(LC_ALL, "");
	//setlocale(LC_ALL, "pt_BR.UTF-8");
	setlocale(LC_ALL, "en_US.UTF-8");
	cout << endl << "C version: " << endl;
	char buffer[256];
	strcpy( buffer, reference );
	char *buffer_end = buffer + strlen(buffer);
	cout << buffer << endl;
	transform( buffer, buffer_end, buffer, ::toupper );
	cout << buffer << endl;
	transform( buffer, buffer_end, buffer, ::tolower );
	cout << buffer << endl;

   return 0;
   return 0;
}

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux