Re: The contents of the pg_timezone_names view bring some surprises

Bryn Llewellyn <bryn@xxxxxxxxxxxx> · Sat, 22 May 2021 17:54:07 -0700

Tom, David, Adrian, and Peter—thank you all very much for taking an interest in my questions. Your responses have, collectively, been an enormous help. I deleted the text of the exchanges in this particular branch of the overall discussion because it's become rather difficult to work out who said what in response to what. It's all there in the "pgsql-general" email archive. 

Here's my summary (in no particular order) of what I've learned. Please feel free to ignore it.

----------------------------------------------------------------------

(1) In the context of discussing timezones, the English word "abbreviation" maps to two distinct notions in PG: "pg_timezone_names.abbrev"; and "pg_timezone_abbrevs.abbrev". Tom earlier said « there are two views [because there are] two sources of truth involved ». This really means that these two columns list different classes of facts. It's therefore unfortunate that they both have the same name.

----------------------------------------------------------------------

(2) It's no wonder, therefore, that I was confused. Anybody with even a slight exposure to relational design would guess that "pg_timezone_names.abbrev" is a FK to the PK in "pg_timezone_abbrevs.abbrev". And they might wonder why "utc_offset" and "is_dst" seem to be denormalized. But they'd probably recall that such things are common, as a usability convenience, in views. Anyway, I'm beyond that confusion now.

----------------------------------------------------------------------

(3) It helps me to think of "pg_timezone_names.abbrev" as "nickname"—which notion is unique, just for each name. It's useful only informally, and only when a particular timezone observes DST, as a shorthand for disambiguation. I used the example earlier:

set timezone = 'America/Los_Angeles';
select
  to_char('2021-11-07 08:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "1st 1:30",
  to_char('2021-11-07 09:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "2nd 1:30";

gets:

      1st 1:30      |      2nd 1:30      
--------------------+--------------------
 01:30:00 PDT (-07) | 01:30:00 PST (-08)

The fact that you can find another nickname spelled "PST" in "pg_timezone_names" doesn't matter: This (as of this email's time stamp):

select name, abbrev, utc_offset from pg_timezone_names where abbrev = 'PST';

gets this:

    name     | abbrev | utc_offset 
-------------+--------+------------
 Asia/Manila | PST    | 08:00:00

The fact that Asia/Manila happens not to observe DST makes my example a little less powerful. Never mind, their politicians could decide to introduce it presently and to give DST the nickname BST (or DOG). It's remarkable, in itself, that "ST" in "BST" means "Summer Time" but that it means "Standard Time" in "PST". But this nicely unerlines the point that there's no rhyme or reason in the design of these abbreviations.

Notably, the "nickname" in "pg_timezone_names" has nothing at all (formally) to do with "abbrev" in "pg_timezone_abbrevs".

----------------------------------------------------------------------

(4) My realization, as set out in #3, helps me now to understand the rule:

« PostgreSQL allows you to specify time zones in three different forms... A time zone abbreviation, for example PST [but ONLY] in date/time input values and with the AT TIME ZONE operator… Such a specification merely defines a particular offset from UTC... The recognized abbreviations are listed in the pg_timezone_abbrevs view »

This (to my embarrassment, now) does say that a "nickname" from "pg_timezone_names" is not allowed as the argument for the AT TIME ZONE operator. But it doesn't adumbrate that the English word "abbreviation" means what I now understand it to mean. You need to get the point from one terse sentence without the luxury of examples to strengthen the explanation.

I mentioned somewhere that I had discovered that this:

select * from pg_timezone_names where abbrev = 'CAT';

gets lots of rows (all with utc_offset = '02:00'::interval) but that this:

select * from pg_timezone_abbrevs where abbrev = 'CAT';

gets no rows. This informs the test that Tom suggested that I might do:

select '2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time zone 'CAT';

It causes this error:

22023: time zone "CAT" not recognized

Notice that this:

select '2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time zone 'CAT42';

silently succeeds. Sure enough, the doc does say « PostgreSQL will accept POSIX-style time zone specifications, as described in Section B.5. [as the argument for AT TIME ZONE] » But what a quagmire of confusion this is. I said elsewhere that the same aim (get the local time at -42 hours w.r.t. UTC) can be met with transparent syntax, thus:

select ('2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time zone 'UTC') - '42:00'::interval;

I know which syntax I prefer!

----------------------------------------------------------------------

(5) I was embarrassingly slow to find this in the doc for the SET command:

« Timezone settings given as numbers or intervals are internally translated to POSIX timezone syntax. For example, after SET TIME ZONE -7, SHOW TIME ZONE would report <-07>+07. »

Search tools are famously feeble for finding such things. This instructs the careful reader to beware of this trap:

-- OK.
set timezone = 'America/Los_Angeles';

-- Clear error 22023: invalid value for parameter "TimeZone".
set timezone = 'America/Los_Angelez';

-- Silently succeeds.
set timezone = 'America/Los_Angele9';

(David pointed this out.) This is confusing in itself. And it's compounded by this fact: POSIX uses positive offsets to denote what both the utc_offset column and to_char(<some timestamptz value>), with the format element TH:TM, shows as negative. But it is what it is.

----------------------------------------------------------------------

(6) To the extent that a catalog view can have a business unique key (and I believe that this must be the case for it to be useful), the unique key for "pg_timezone_names" is "name" and the unique key for "pg_timezone_abbrevs" is "abbrev". I'm going to assume that PG has code to maintain these invariants. This gives me the way to interpret the statement "an [abbreviation] specification merely defines a particular offset from UTC".

Peter showed us a query like this:

select count(*) from (
  select abbrev from pg_timezone_abbrevs
  group by abbrev
  having count(*) > 1) as a;

In my un-tampered-with PG 13.2 env, it gets a count of zero. I like this.

Adrian pointed me to the "B.4. Date/Time Configuration Files" Appendix here:

https://www.postgresql.org/docs/current/datetime-config-files.html

and to this in particular:

«
The @OVERRIDE syntax indicates that subsequent entries in the file can override previous entries (typically, entries obtained from included files). Without this, conflicting definitions of the same timezone abbreviation are considered an error.
»

So even without "timezone_abbreviations" set to "default" at session level, "pg_timezone_abbrevs.abbrev" will always be unique.

----------------------------------------------------------------------

(7) There are two different syntaxes for setting the timezone session environment variable. This (or with TO instead of =):

set timezone =

and this:

set time zone

This, too, is hugely confusing. (Correspondingly, there's both "show timezone" and "show time zone".)

The first "set" (and "show") alternative is consistent with how you set all other session environment variables. Moreover, all are observable with the current_setting() built-in function. And there the argument "timezone" can only be spelled as one word. This makes me favor the "set timezone" spelling.

However, only the "set time zone" spelling allows an argument that's an explicit interval value like this:

set time zone interval '-08:00';

I said "explicit interval value" because this isn't true. For example, these two fail:

set time zone '-08:00'::interval;

and

set time zone make_interval(hours=>-8);

both fail, even though each uses a legal interval value. This is confusion on top of confusion. All the more reason to avoid it.