Frank Sweetser wrote: > Unless, of course, you're at a good sized school with lots of > international students, and have fileservers holding filenames created > on desktops running in Chinese, Turkish, Russian, and other locales. What I struggle with here is why they're not using ru_RU.UTF-8, cn_CN.UTF-8, etc as their locales. Why mix charsets? I don't think that these people should be forced to use a utf-8 database and encoding conversion if they want to do things like mix-and-match charsets for file name chaos on their machines, though. I'd just like to be able to back up systems that _do_ have consistent charsets in ways that permit me to later reliably search for files by name, restore to any host, etc. Perhaps I'm strange in thinking that all this mix-and-match encodings stuff is bizarre and backward. The Mac OS X and Windows folks seem to agree, though. Let the file system store unicode data, and translate at the file system or libc layer for applications that insist on using other encodings. I do take Greg Stark's point (a) though. As *nix systems stand, solutions will only ever be mostly-works, not always-works, which I agree isn't good enough. Since there's no sane agreement about encodings on *nix systems and everything is just byte strings that different apps can interpret in different ways under different environmental conditions, we may as well throw up our hands in disgust and give up trying to do anything sensible. The alternative is saying that files the file system considers legal can't be backed up because of file naming, which I do agree isn't ok. The system shouldn't permit those files to exist, either, but I suspect we'll still have borked encoding-agnostic wackiness as long as we have *nix systems at all since nobody will ever agree on anything for long enough to change it. Sigh. I think this is about the only time I've ever wished I was using Windows (or Mac OS X). Also: Greg, your point (c) goes two ways. If I can't trust my backup software to restore my filenames from one host exactly correctly to another host that may have configuration differences not reflected in the backup metadata, a different OS revision, etc, then what good is it for disaster recovery? How do I even know what those byte strings *mean*? Bacula doesn't even record the default system encoding with backup jobs so there's no way for even the end user to try to fix up the file names for a different encoding. You're faced with some byte strings in wtf-is-this-anyway encoding and guesswork. Even recording lc_ctype in the backup job metadata and offering the _option_ to convert encoding on restore would be a big step, (though it wouldn't fix the breakage with searches by filename not matching due to encoding mismatches). Personally, I'm just going to stick to a utf-8 only policy for all my hosts, working around the limitation that way. It's worked ok thus far, though I don't much like the way that different normalizations of unicode won't match equal under SQL_ASCII so I can't reliably search for file names. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general