В Чтв, 26.08.2004, в 22:36, Gerard Samuel пишет: > My site/code/database is developed primarily for the english language. > I've had people from "The Far East" add content to my site using their > native language, and it is displaying properly in the site. > But Im a bit concerned about the number of characters these languages use. > For example, I've had someone enter -> > chinese testing 中文 > > It is saved in the database as -> > chinese testing 中文 Your web page uses a character set that does not contain chinese characters. So the browser decided to send their respective HTML entities instead. These entities, as you correctly observed, amount to more than one (latin, ASCII) character. > Now, forgive my ignorance, but I have no idea what the additional > chinese characters mean, but from the values in the database, Im > assuming that it amounts to 3 characters. > But if Im correct that those are 3 characters, it is > using up 24 characters in a column. > > My concern is that what if I were to limit a column to say 25 "english" > characters, and a chinese fellow, comes by and hypothetically says > "Hello World" in chinese and goes over the limit of the column, the data > will be truncated. PostgreSQL will not truncate the data, but reject it; but the general point is correct. > Is there anything that can be done to overcome this shortcoming? > > Im currently using PostgreSQL 7.4.2, using SQL_ASCII as the database > characterset, FreeBSD 4.10, php 4.3.6. Change your site to use a character set that includes chinese characters, for example Unicode. The most common encoding of Unicode on the web is UTF-8. It's also the encoding PostgreSQL uses when you use UNICODE as the database encoding. If you decide to switch your site to UTF-8 and want varchar(25) to mean 25 characters, and not 25 bytes, you have to change the database encoding to UNICODE accordingly. -- Markus Bertheau <twanger@xxxxxxxxxxxxxx>