Re: The contents of the pg_timezone_names view bring some surprises

Bryn Llewellyn <bryn@xxxxxxxxxxxx> · Wed, 19 May 2021 17:50:07 -0700

tgl@xxxxxxxxxxxxx wrote:

david.g.johnston@xxxxxxxxx wrote:

bryn@xxxxxxxxxxxx wrote:

Am I missing an essential clue to resolving what seems to me to be a paradox? Or am I seeing two kinds of bug?

You are missing the material in appendix B.4. Date/Time Configuration Files
https://www.postgresql.org/docs/current/datetime-config-files.html

The short answer is that the reason there are two views is that there are two sources of truth involved. pg_timezone_names reflects the zone names defined in the IANA timezone database, while pg_timezone_abbrevs reflects the abbreviations defined in our user-customizable abbreviations table.  It'd be impossible to make them match exactly, and we don't try exceedingly hard.  In particular, the IANA list has some zones such as "CET" that don't follow their own continent/city naming convention.  (AFAIK those are all legacy zones that they'd get rid of if they weren't concerned with backwards compatibility.)  If those look like abbreviations, which they mostly do, then it's confusing.

Where the rubber meets the road is in timestamptz input, and there we consult the abbreviations table first.  (Not sure if that's documented, but you can easily prove it by experiment.)

As for the question about "abbreviations" like +09 --- those are not abbreviations at all, they're just hard-coded numeric UTC offsets.  So they don't appear in pg_timezone_abbrevs.  IANA uses those as display offsets in zones where there's not any widely-used-on-the-ground abbreviation.

Thanks, as ever, David and Tom, for your quick responses. Thanks also to Adrian Klaver, who replied in a branched thread with this—in response to my comment about my reading of the information content of the pg_timezone_abbrevs view: « This claims (as I read it) that a time zone abbreviation uniquely determines an offset from UTC. » 

adrian.klaver@xxxxxxxxxxx wrote:

It says no such thing and would be lying if it did. Take a look at this:

https://en.wikipedia.org/wiki/List_of_time_zone_abbreviations

and see the abbreviations that share offsets.

What it is saying that, for example,  the timezone America/Los_Angeles has two timezone abbreviations PDT (what I'm currently in) and PST. If you use an abbreviation you don't get the DST transition rules that a full timezone name has.

______________________________________________________________________

Firstly, David's response.

I (re)read "B.4. Date/Time Configuration Files" and located this directory on my macOS Big Sur for my PG Version 13.2 installation:

/usr/local/share/postgresql/timezonesets

I saw that it has files with names like America.txt, Asia.txt, Europe.txt, and so on. I opened America.txt and read this:

# NOTE:
#   This file is NOT loaded by the PostgreSQL database.  It just serves as
#   a template for timezones you could need.  See the `Date/Time Support'
#   appendix in the PostgreSQL documentation for more information.

The x-ref'd Appendix doesn't x-ref further to an explanation of the steps to follow in order actually to add timezons that I might need. It does have an entry for PST.

# CONFLICT! PST is not unique
# Other timezones:
#  - PST: Philippine Standard Time
PST    -28800    # Pacific Standard Time

Note the comment about a conflict. This suggests to naïve readers like me that conflicts are to be avoided. It also has a PDT entry.

I also looked in Asia.txt and found this:

# CONFLICT! PST is not unique
# Other timezones:
#  - PST: Pacific Standard Time (America)
PST     28800    # Philippine Standard Time

I s'pose that this is what David wanted me to see. Conflicts are a bad thing—but not disallowed.

This does mean that the outcome of this:

select ('2000-01-01 12:00:00 +00:00'::timestamptz) at time zone 'PST';

is undefined (unless I'm missing a statement of the priority rule for handling conflicts elsewhere in the PG doc) as I said in my email that started this thread, using a different example.
______________________________________________________________________

Secondly, Adrians's response.

Yes, the point that a timezone abbreviation does not uniquely determine the timezone offset is taken now. But notice this:

« In short, this is the difference between abbreviations and full names: abbreviations represent a specific offset from UTC…»
from

"8.5.3. Time Zones"
https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-TIMEZONES

This seems to me to be flat-out wrong. An abbreviation, in general, does not represent a specific offset from UTC. Rather, it can represent two or more different offsets.

This tells me that I must recommend never to use a timezone abbreviation in application code. It's anyway illegal as an argument for "set timezone". (If the same text happens to be a timezone name, well… that's what it is in that context.) And the result of using a timezone abbreviation in the "at time zone" clause is, in general, unspecified.

I'm convinced that the typical use case calls for using time zone names—exactly, as as been said, to benefit from the encoded DST transition rules that this brings. If I want to arrange a Zoom call with a colleague in Helsinki on, say, the Monday after the San Francisco "spring forward" weekend, and I don't remember when Finland springs forward, then I can do this to see what local time it is over there at my 09:00:

select '2021-03-20 09:00:00 America/Los_Angeles'::timestamptz at time zone 'Europe/Helsinki';

And if, for some strange reason, I want to find out what my local time in this example is in a +03:00 timezone (as this is shown in a ::text typecast of  timestamptz value and not as POSIX has it) not caring about who does what with DST, I can spell it as I mean it:

select ('2021-03-20 09:00:00 America/Los_Angeles'::timestamptz at time zone 'UTC') + '3 hours'::interval;

This formulation is self-documenting. And it sidesteps all that rubbish about POSIZ saying "up" for what everybody else calls "down".

Because the purpose for the pg_timezone_abbrevs view (as it seems to me) is to control what's legal in the "at time zone" clause, I can forget it. Whatever it might say about utc_offset and is_dst is available, against the unique name, in pg_timezone_names.

This is an instructive example:

set timezone = 'America/Los_Angeles';
select
  to_char('2021-11-07 08:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "1st 1:30",
  to_char('2021-11-07 09:30:00 UTC'::timestamptz, 'hh24:mi:ss TZ (OF)') as "2nd 1:30";

This is the result:

      1st 1:30      |      2nd 1:30      
--------------------+--------------------
 01:30:00 PDT (-07) | 01:30:00 PST (-08)

The "(-07)" and "(-08)" convey real information. But the "PDT" and "PST" do not—except in the sense that one is culturally aware and can Google for things. The pg_timezone_names view never has PDT and PST at the same querying moment. Rather, it has only one according to the date. And this

select abbrev, utc_offset, is_dst::text
from pg_timezone_abbrevs
where abbrev in ('PST', 'PDT');

with this result:

 PDT    | -07:00:00  | true
 PST    | -08:00:00  | false

gives just an illusion of value because the two abbreviations happen each to be unique in that view. Each could easily have many rows that, for each abbreviation, mix "true" and "false".
______________________________________________________________________

Thirdly, Tom's response.

About « the reason there are two views is that there are two sources of truth involved », well yes… and who wants two sources of truth? All the more reason to forget the pg_timezone_abbrevs view and never to use an abbreviation in the "at time zone" clause.

About « the IANA list has some zones such as "CET" that don't follow their own continent/city naming convention… If those look like abbreviations, which they mostly do, then it's confusing. », what about "UTC" itself. Sources like this"

The Difference Between GMT and UTC
https://www.timeanddate.com/time/gmt-utc-time.html

point out that UTC is a time standard and not a time zone. But it's still useful to have it listed in pg_timezone_names. It's a challenge to work out how to recommend what subset of what's listed there to use. This page:

List of tz database time zones
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

is helpful because it shows the "cannonical" or not status of each entry. But consider this:

select name, abbrev, utc_offset, is_dst::text
from pg_timezone_names
where
  lower(name) like 'etc/gmt%0%' or
  lower(name) like 'etc/gmt%1%' or
  lower(name) like 'etc/gmt%2%' or
  lower(name) like 'etc/gmt%3%' or
  lower(name) like 'etc/gmt%4%' or
  lower(name) like 'etc/gmt%5%' or
  lower(name) like 'etc/gmt%6%' or
  lower(name) like 'etc/gmt%7%' or
  lower(name) like 'etc/gmt%8%' or
  lower(name) like 'etc/gmt%9%'
order by utc_offset;

with this result:

 Etc/GMT+12 | -12    | -12:00:00  | false
 Etc/GMT+11 | -11    | -11:00:00  | false
 Etc/GMT+10 | -10    | -10:00:00  | false
 ...
 Etc/GMT+1  | -01    | -01:00:00  | false
 Etc/GMT+0  | GMT    | 00:00:00   | false
 Etc/GMT-0  | GMT    | 00:00:00   | false
 Etc/GMT0   | GMT    | 00:00:00   | false
 Etc/GMT-1  | +01    | 01:00:00   | false
 ...
 Etc/GMT-12 | +12    | 12:00:00   | false
 Etc/GMT-13 | +13    | 13:00:00   | false
 Etc/GMT-14 | +14    | 14:00:00   | false

(Notice the "up is down" POSIX silliness.) The Wikipedia "List of tz database time zones" has each of these as canonical. But I intend to recommend avoiding using these for two reasons. First reason: 'cos each name contains a number, it isn't controlled by the names in pg_timezone_names (like David pointed out in another email).

Try this (simulating a keystroke bounce typo):

set timezone = 'Etc/GMT+122';
show timezone;

It's silently accepted and gives "ETC/GMT+122" with "show". And then, in turn, this:

select '2021-03-20 09:00:00'::timestamptz;

gives this result:

 2021-03-20 09:00:00-122

Nonsense, eh? As David said, it's an instance of the more general:

set timezone = 'Foo42Bar';
show timezone;

I wish there was a way to turn this off and accept only pg_timestamp_names.name values.

The second reason is that the abbreviations confuse ordinary readers who are slow to remember the "up is down" story.