Search Postgresql Archives

Re: Catastrophic changes to PostgreSQL 8.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/12/2009 9:18 PM, Kern Sibbald wrote:
Hello,

I am the project manager of Bacula.  One of the database backends that Bacula
uses is PostgreSQL.

As a Bacula user (though I'm not on the Bacula lists), first - thanks for all your work. It's practically eliminated all human intervention from something that used to be a major pain. Configuring it to handle the different backup frequencies, retention periods and diff/inc/full needs of the different data sets was a nightmare, but once set up it's been bliss. The 3.x `Accurate' mode is particularly nice.

Bacula sets the database encoding to SQL_ASCII, because although
Bacula "supports" UTF-8 character encoding, it cannot enforce it.  Certain
operating systems such as Unix, Linux and MacOS can have filenames that are
not in UTF-8 format.  Since Bacula stores filenames in PostgreSQL tables, we
use SQL_ASCII.

I noticed that while doing some work on the Bacula database a while ago.

I was puzzled at the time about why Bacula does not translate file names from the source system's encoding to utf-8 for storage in the database, so all file names are known to be sane and are in a known encoding.

Because Bacula does not store the encoding or seem to transcode the file name to a single known encoding, it does not seem to be possible to retrieve files by name if the bacula console is run on a machine with a different text encoding to the machine the files came from. After all, café in utf-8 is a different byte sequence to café in iso-9660-1, and won't match in equality tests under SQL_ASCII.

Additionally, I'm worried that restoring to a different machine with a different encoding may fail, and if it doesn't will result in hopelessly mangled file names. This wouldn't be fun to deal with during disaster recovery. (I don't yet know if there are provisions within Bacula its self to deal with this and need to do some testing).

Anyway, it'd be nice if Bacula would convert file names to utf-8 at the file daemon, using the encoding of the client, for storage in a utf-8 database.

Mac OS X (HFS Plus) and Windows (NTFS) systems store file names as Unicode (UTF-16 IIRC). Unix systems increasingly use utf-8, but may use other encodings. If a unix system does use another encoding, this may be determined from the locale in the environment and used to convert file names to utf-8.

Windows systems using FAT32 and Mac OS 9 machines on plain old HFS will have file names in the locale's encoding, like UNIX systems, and are fairly easily handled.

About the only issue I see is that systems may have file names that are not valid text strings in the current locale, usually due to buggy software butchering text encodings. I guess a *nix system _might_ have different users running with different locales and encodings, too. The latter case doesn't seem easy to handle cleanly as file names on unix systems don't have any indication of what encoding they're in stored with them. I'm not really sure these cases actually show up in practice, though.

Personally, I'd like to see Bacula capable of using a utf-8 database, with proper encoding conversion at the fd for non-utf-8 encoded client systems. It'd really simplify managing backups for systems with a variety of different encodings.

( BTW, one way to handle incorrectly encoded filenames and paths might be to have a `bytea' field that's generally null to store such mangled file names. Personally though I'd favour just rejecting them. )

We set SQL_ASCII by default when creating the database via the command
recommended in recent versions of PostgreSQL (e.g. 8.1), with:

CREATE DATABASE bacula ENCODING 'SQL_ASCII';

However, with PostgreSQL 8.4, the above command is ignored because the default
table copied is not template0.

It's a pity that attempting to specify an encoding other than the safe one when using a non-template0 database doesn't cause the CREATE DATABASE command to fail with an error.

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux