Re: The contents of the pg_timezone_names view bring some surprises

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Sun, 23 May 2021 08:44:58 -0700

On 5/22/21 5:52 PM, Bryn Llewellyn wrote:
Tom, David, Adrian, and Peter—thank you all very much for taking an 
interest in my questions. Your responses have, collectively, been an 
enormous help. I deleted the text of the exchanges in this particular 
branch of the overall discussion because it's become rather difficult to 
work out who said what in response to what. It's all there in the 
"pgsql-general" email archive.

Here's my summary (in no particular order) of what I've learned. Please 
feel free to ignore it.

----------------------------------------------------------------------

(1) In the context of discussing timezones, the English word 
"abbreviation" maps to two distinct notions in PG: 
"pg_timezone_names.abbrev"; and "pg_timezone_abbrevs.abbrev". Tom 
earlier said « there are two views [because there are] two sources of 
truth involved ». This really means that these two columns list 
different classes of facts. It's therefore unfortunate that they both 
have the same name.

They are reporting the same facts, just in different context. 
pg_timezone_abbrevs.abbrev is a list of un-duplicated, as I found out, 
abbreviations. pg_timezone_names.abbrev shows the abbreviations or 
offsets in effect at CURRENT_TIMESTAMP.

----------------------------------------------------------------------

(2) It's no wonder, therefore, that I was confused. Anybody with even a 
slight exposure to relational design would guess 
that "pg_timezone_names.abbrev" is a FK to the PK in 
"pg_timezone_abbrevs.abbrev". And they might wonder why "utc_offset" and 
"is_dst" seem to be denormalized. But they'd probably recall that such 
things are common, as a usability convenience, in views. Anyway, I'm 
beyond that confusion now.

Since the views are based on functions that are displaying different 
context I don't find it surprising.

----------------------------------------------------------------------

(3) It helps me to think of "pg_timezone_names.abbrev" as 
"nickname"—which notion is unique, just for each name. It's useful only 
informally, and only when a particular timezone observes DST, as a 
shorthand for disambiguation. I used the example earlier:

The fact that Asia/Manila happens not to observe DST makes my example a 
little less powerful. Never mind, their politicians could decide to 
introduce it presently and to give DST the nickname BST (or DOG). It's 
remarkable, in itself, that "ST" in "BST" means "Summer Time" but that 
it means "Standard Time" in "PST". But this nicely unerlines the point 
that there's no rhyme or reason in the design of these abbreviations.

Notably, the "nickname" in "pg_timezone_names" has nothing at all 
(formally) to do with "abbrev" in "pg_timezone_abbrevs".

Yes it does. It just there is not always a one-to-one correspondence 
between the two. For instance duplicate abbreviations do exist but they 
are filtered out of pg_timezone_abbrevs.

----------------------------------------------------------------------

(4) My realization, as set out in #3, helps me now to understand the rule:

« PostgreSQL allows you to specify time zones in three different 
forms... A time zone abbreviation, for example PST [but ONLY] in 
date/time input values and with the AT TIME ZONE operator… Such a 
specification merely defines a particular offset from UTC... The 
recognized abbreviations are listed in the pg_timezone_abbrevs view »

This (to my embarrassment, now) does say that a "nickname" from 
"pg_timezone_names" is not allowed as the argument for the AT TIME ZONE 
operator. But it doesn't adumbrate that the English word "abbreviation" 
means what I now understand it to mean. You need to get the point from 
one terse sentence without the luxury of examples to strengthen the 
explanation.

silently succeeds. Sure enough, the doc does say « PostgreSQL will 
accept POSIX-style time zone specifications, as described in Section 
B.5. [as the argument for AT TIME ZONE] » But what a quagmire of 
confusion this is. I said elsewhere that the same aim (get the local 
time at -42 hours w.r.t. UTC) can be met with transparent syntax, thus:

*select ('2021-05-22 15:00:00 America/Los_Angeles'::timestamptz at time 
zone 'UTC') - '42:00'::interval;*

I know which syntax I prefer!

That is the root of the above, preference. There are many ways to 
express time zones and as a general purpose database Postgres needs to 
deal with all of them.

----------------------------------------------------------------------

(5) I was embarrassingly slow to find this in the doc for the SET command:

« Timezone settings given as numbers or intervals are internally 
translated to POSIX timezone syntax. For example, after SET TIME ZONE 
-7, SHOW TIME ZONE would report <-07>+07. »

(David pointed this out.) This is confusing in itself. And it's 
compounded by this fact: POSIX uses positive offsets to denote what both 
the utc_offset column and to_char(<some timestamptz value>), with the 
format element TH:TM, shows as negative. But it is what it is.

See my earlier post about trying to find logic in this.

----------------------------------------------------------------------

(6) To the extent that a catalog view can have a business unique key 
(and I believe that this must be the case for it to be useful), the 
unique key for "pg_timezone_names" is "name" and the unique key for 
"pg_timezone_abbrevs" is "abbrev". I'm going to assume that PG has code 
to maintain these invariants. This gives me the way to interpret the 
statement "an [abbreviation] specification merely defines a particular 
offset from UTC".

«
The @OVERRIDE syntax indicates that subsequent entries in the file can 
override previous entries (typically, entries obtained from included 
files). Without this, conflicting definitions of the same timezone 
abbreviation are considered an error.
»

So even without "timezone_abbreviations" set to "default" at session 
level, "pg_timezone_abbrevs.abbrev" will always be unique.

True.

----------------------------------------------------------------------

(7) There are two different syntaxes for setting the timezone session 
environment variable. This (or with TO instead of =):

*set timezone =*

and this:

*set time zone*

This, too, is hugely confusing. (Correspondingly, there's both "show 
timezone" and "show time zone".)

The first "set" (and "show") alternative is consistent with how you set 
all other session environment variables. Moreover, all are observable 
with the current_setting() built-in function. And there the argument 
"timezone" can only be spelled as one word. This makes me favor the "set 
timezone" spelling.

However, only the "set time zone" spelling allows an argument that's an 
explicit interval value like this:

*set time zone interval '-08:00';*

I said "explicit interval value" because this isn't true. For example, 
these two fail:

*set time zone '-08:00'::interval;*

and

*set time zone make_interval(hours=>-8);*

both fail, even though each uses a legal interval value. This is 
confusion on top of confusion. All the more reason to avoid it.

It is spelled out here:

https://www.postgresql.org/docs/12/sql-set.html

Basically set time zone is for customizing the timezone value, whereas 
set timezone is for using 'standard' timezone values. They both end up 
setting the same value.

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx