On 8/19/06, John Gunther <owner@xxxxxxxxxxxxxxxx> wrote:
I've been reading about locales, encodings, sort orders, the to_ascii function and, embarrasingly, I'm more confused than enlightened.: What I want is very simple: 1) I want the database to correctly accept, store, and display alphabetic characters, including European accented characters, in HTML forms. 2) I want sorting to ignore the diacritical marks so that, for example, u, u-accent, and u-umlaut are all sorted as if they were plain u. 3) I want sorting to ignore non-alphanumerics, letter case, and white space. To illustrate, the following data is in sorted order: St-Émile stendahl st ènders St. Epson Can someone tell me what combination of PostgreSQL and Linux settings I need for this? It seems like a very basic question, but I'm just dense, I guess. I've tried a half dozen time-consuming configs without success.
Well, you'll obviously have to use UTF if you plan on supporting more then one language with different accented characters. The sorting issue is a bit of a problem, though. Pgsql uses the same collation in all databases in a database cluster (carved into stone at cluster init) so I don't know of a good way you could collate your data....you could concievably keep a copy of accented strings replacing the accented characters with their non-accented counterparts as you see fit and collate on that column, but that's not a very elegant way of handling the problem, is it? You might have more luck with another database like mysql 4.1+ (where accent-insensitive UTF collation is directly supported), MS SQL (where you can define encoding and collation settings at the database level, and so concievably have a database for each language, if you know exactly which languages you'll have) or Firebird (where you define an encoding at the column level and can collate any way you wish in each column). Hope I've helped, t.n.a.