Re: the impact of encoding on performance.

Tom Lane <tgl@xxxxxxxxxxxxx> · Thu, 10 Mar 2005 18:46:13 -0500

Michael Ben-Nes <miki@xxxxxxxxxxxx> writes:
>>  The drawback of using locales other than C or POSIX in PostgreSQL is 
>> its performance impact. It slows character handling and prevents 
>> ordinary indexes from being used by LIKE. For this reason use locales 
>> only if you actually need them.

> What is the impact of the locale  on the server ? is it irelevant, small 
> or huge ?

> Encoding of the DB impact performance too ? UTF8, 8859-8 ?

These aren't really separable since you generally don't get to choose
the encoding independently of the locale.

I'm working on some simple benchmarking consisting of running mysql's
sql-bench against a PG 8.0.1 server on a Fedora Core 3 machine.  Mostly
I'm interested in understanding in detail why sql-bench makes us look
so bad, but as long as I'm at it it can provide one datapoint in answer
to your question.  In two runs that were identical except one used
en_US.utf8 locale and UTF8 encoding while the other used C locale and
SQL-ASCII encoding, most of the tests didn't show any meaningful
difference, but a couple of tests showed as much as a 2X advantage for C
locale.  These were tests that were heavily dependent on comparison of
strings, such as a SELECT COUNT(DISTINCT foo) across a large table.

So it would depend on your workload.  Certainly it's possible that
locale would make a big difference to you, but it might not.

Also, this all depends quite a bit on how efficiently your libc
implements strcoll() for non-C locales.  I believe there are some
platforms out there that are much slower than glibc, and would have
a correspondingly higher penalty for using a non-C locale.  You could
investigate this by timing "sort" on a large file in both locales.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match