Genart last call review of draft-murchison-tzdist-tzif-14

Dale Worley <worley@xxxxxxxxxxx> · Sun, 30 Sep 2018 19:42:24 -0700

Reviewer: Dale Worley
Review result: Ready with Issues

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at

<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.

Document:  draft-murchison-tzdist-tzif-14
Reviewer:  Dale R. Worley
Review Date:  2018-09-30
IETF LC End Date:  2018-10-09
IESG Telechat date:  unknown

Summary:
       This draft is on the right track but has open issues, described
       in the review.

Major issues:

1. The draft does not specify the versioning and compatibility
principles governing TZif files and their processors.  Certain
passages suggest that processors are expected to *not* examine the
version number in a file, but instead, extensions to the file format
are made by adding further data blocks to the end of the file, and
later-version processors handle earlier-version files by noticing that
latter data blocks are missing.  This approach implies that processors
based on earlier versions to not notice that there appears to be
trailing garbage in the file.  But silent acceptance of trailing
garbage is not specified, and this strategy is different from most
IETF standards.

2. The semantics of the various data items -- what they mean and how
they are to be used in processing -- is only hinted at.  I suspect
that the draft is targeted at members of the community who already
thoroughly understand the semantics of the data items (based on their
names), but this is not stated, nor is any reference given to where
one might learn this information.  OTOH, the glossary (in section 2)
gives definitions of several terms that suggest the document is
attempting to define the semantics, but the definitions given are
nowhere near sufficient to specify those semantics.

3. Apparently (see comments on section 4), characters outside the
ASCII set are allowed in time zone designations.  If so, their
encoding needs to be specified.

Minor issues:

Nits/editorial comments: 

1.  Introduction

   This specification does not define the source of leap second
   information, nor does it define the source the time zone data,
   metadata, identifiers, aliases, localized names, or versions as
   defined in Section 3 of [RFC7808].  One such source is the IANA-
   hosted time zone database [RFC6557].

The sequence "nor does it define the source the time zone data" is
gramatically incorrect.  I would suggest it should be "... the source
of ...", but making that change leaves the semantically heterogenous
list "the time zone data, metadata, identifiers, aliases, localized
names, or versions".  Also, it's not clear how "as defined in Section
3 of [RFC7808]" attaches to the sentence, since the closest item in
the list, "versions", is defined in section 3 and not RFC 7808.  I
suspect that some part of the wording was accidentally deleted.

2.  Conventions Used in This Document

   Daylight Saving Time (DST):  The time according to a location's law
      or practice, adjusted as necessary from standard time.  The
      adjustment may be positive, negative, or zero.

This seems to read that "Daylight Saving Time" is not "standard time"
plus the summer time offset, but standard time, adjusted by whatever
summer time offset might be in effect at the moment.  But that is not
not the definition of DST, at least, not as commonly used in the US.
What exactly is meant here?  I would prefer a much more careful
explanation of how the various terms are used.

      2.  a change in whether standard or daylight saving time is in use

Is this intended to include the regular transition between summer time
and winter time, or only when a locality starts or stops the practice
of using summer time each year or the schedule of transitions?

   UNIX Leap Time:  [...]
      Similarly, if
      the second leap second record occurs at 1972-12-31 23:59:60 UTC,
      its UNIX leap time would be 94694401; the second occurrence
      accounts for the first leap second.

I think this would be clearer as:

      Similarly, if the second leap second record occurs at 1972-12-31
      23:59:60 UTC, the UNIX leap time of 1972-12-31 23:59:60 UTC
      would be 94694401 and the UNIX leap time of 1973-01-01 00:00:00
      UTC would be 94694402.

--

      If a TZif file specifies no
      leap second records, UNIX leap time is equivalent to UNIX time.

I think s/equivalent to/equal to/ reads better here.

   Wall Time:  The time as shown on a clock set according to a
      location's law or practice.

In what way does this differ from "Local Time"?

After reading further in the document, I suspect that the structure
you are intending is:  There is a "standard time" which is derived
from UTC, and a "daylight savings time" which is derived from UTC,
*both of which run all the time*.  From those two, a "wall time" is
derived by specifying points where "wall time" switches between
"standard time" and "daylight savings time".

3.  The Time Zone Information Format (TZif)

The description here seems to be out of order.  Also, the description
of the data blocks could be clearer; it seems to me that "containing"
makes more sense in this context than "using".  A better ordering is:

   The time zone information format begins with a fixed 44-octet
   header (Section 3.1).  The TZif header contains a field which
   specifies the version of the file's format.

   The header is followed by a variable-length data block (Section 3.2)
   containing four-octet (32-bit) transition times and leap second
   occurrences.  These 32-bit values are limited to representing times
   from 1901-12-13 20:45:52 through 2038-01-19 03:14:07 UT.

   Version 1 files terminate after the 32-bit data block.
   Version 2 and 3 files extend the format by appending a second
   44-octet header, another variable-length data block containing eight-octet
   (64-bit) transition times and leap second occurrences, and a variable
   length footer (Section 3.3).  These 64-bit values can represent times
   approximately 292 billion years into the past or future.

   [Move "NOTE" to here.]

Figure "General Format of TZif Files"

I think the figure should use s/Transitions/Values/, since not all
of the values are transition times.

3.1.  TZif Header

       +---------------+---+
       |  magic    (4) |ver|
       +---------------+---+

s/ver/ver(1)/

      '2' (0x32)  Version 2 - The file MUST contain the 32-bit header
         and data block, a 64-bit header and data block, and a footer.

The phrases "32-bit header" and "64-bit header" don't work, as the
headers have the same format.  They could be called "header for 32-bit
data", etc., but that's a bit awkward.  Alternatively, names for all
of the data sections could be defined in section 2.

   typecnt:  A four-octet unsigned integer specifying the number of
      local time type records contained in the data block - MUST NOT be
      zero.  (Although time type 0 is not used in files that have
      nonempty TZ strings but no transitions, it is nevertheless
      required because many TZif readers reject files that lack time
      types.)

This is hard to understand.  I *think* "time type 0" means "time type
record 0", or "the time type defined by by time type record 0", and
the parenthesized sentence means "In files that have nonempty TZ
strings but no transitions, the time type records are not used;
nonetheless, in this case there must be one record, time type 0 (which
will be ignored) because many TZif readers reject files that lack time
type records."  Here, you are fighting against the very common usage
where a "record" of some sort has a "type" field which contains a
small integer, so you have to be very clear that the "0" *indexes* the
time type record, rather than being the value of a "type" field value
in the time type record.

3.2.  TZif Data Block

   The TZif data block consists of seven variable-length elements,
   [...]

I think you want to say "A TZif data block", since there can be two of
them in a TZif file.

   In the initial data block, time values are 32-bit (TIME_SIZE = 4
   octets).

I think it would read better to say "first data block", to parallel
"second data block" and also to clarify you are not referring to the
header.

      +---------------+-+-+---+
      |  utoff (4)    |dst|idx|
      +---------------+---+---+

There is an extra "+" in the first line, and the field lengths should
be specified:

      +---------------+------+------+
      |  utoff (4)    |dst(1)|idx(1)|
      +---------------+------+------+

--

      (is)dst:  A one-octet value indicating whether local time should
         be considered Daylight Savings Time (DST).  The value MUST be 0
         or 1.  A value of one (1) indicates that DST is in effect.  A
         value of zero (0) indicates that standard time in effect.

It seems to me to be better to phrase the last two sentences as "A
value of one (1) indicates that this time type is DST.  A value of
zero (0) indicates that this time type is standard time."  But that is
assuming that "time type" is not just describing a specific interval
of time, but rather a *type* of intervals of time.

      (desig)idx:  A one-octet unsigned integer specifying an index into
         the series of time zone designation octets, thereby selecting a
         particular designation string.  Each index MUST be in the range
         [0, 'charcnt'), and MUST index a NUL-terminated (0x00) sequence
         of octets.

The last clause isn't very useful, as it is equivalent to "there must
be a NUL at or after the idx position in time zone designations".  I
suspect the real meaning is "Each index MUST be in the range [0,
'charcnt'), and is interpreted as designating the NUL-terminated
ASCII-encoded string of characters starting at position idx in the
time zone designations.  (This string MAY be empty.)  A NUL octet MUST
exist in the time zone designations at or after position idx."

However, Appendix A interoperability item 9 implies that characters
outside the ASCII set are allowed.  If so, their encoding needs to be
specified.

3.3.  TZif Footer

The footer diagram should be:

                      +-------+--------------------+-------+
                      | NL(1) |  TZ string (0...)  | NL(1) |
                      +-------+--------------------+-------+

4.  Interoperability Considerations

   o  The sequence of time changes defined by the 32-bit header and data
      block SHOULD be a contiguous subsequence of the time changes
      defined by the 64-bit header and data block, and by the footer.
      This guideline helps obsolescent version 1 readers agree with
      current readers about timestamps within the contiguous
      subsequence.  It also lets writers not catering to obsolescent
      readers use a 'timecnt' of zero in the 32-bit data to save space.

I think s/not catering to/not supporting/, and s/32-bit data/32-bit
data block/.

      *  "application/tzif-leap" (Section 8.2) to indicate that leap
         second records are included in the TZif data.

You probably need to spell out:  "For version 1 files, leap second
records must be present in the 32-bit data block; for version 2 and 3
files, leap second records must be present in the 64-bit data block."

10.3.  URIs

This section is odd.  Item [1] is just the URL for BCP 14, but BCP 14
is also listed in the references section.  Items [6] and [7] duplicate
[8] and [9], but they're references and should be treated as such.
Items [3], [4], and [5] are more interesting -- they're locations
inside reference [POSIX].  The Editor should be able to suggest a
better way of presenting these references.

Appendix A.  Common Interoperability Issues

   Most of these are problems in generating TZif files
   for use by readers conforming to predecessors of this specification.

It would be helpful to those dealing with compatibility issues to have
references to the predecessors of this specification.