Re: [Last-Call] last call review of draft-ietf-ntp-chronos-16

Neta R S <neta.r.schiff@xxxxxxxxx> · Sun, 2 Jul 2023 22:25:31 -0400

Hi Dave,
Thanks for your comments.
Please see my reply inline below (in blue).

Best,
Neta

On Sun, Jun 25, 2023 at 8:41 AM Dave Hart <davehart@xxxxxxxxx> wrote:
Section 3.1 "Khronos Calibration" discusses gathering 1000 NTP server addresses.  This number is the variable n, it seems worth mentioning.  
 >> We added a clarification regarding it. 

It mentions gathering from "pool.org" when presumably "pool.ntp.org" was intended.  I don't see any explanation in that section of the calibration process beyond gathering addresses.  If Khronos calibration consists solely of gathering n servers, perhaps a different description such as "Gathering the Khronos Pool" would be more appropriate.
>> Thanks, we fixed that.

There is discussion in the same section of the number of queries being less than 10 per day and comparison to the average number of DNS queries per machine.  I think it's more relevant to compare the number of queries Khronos causes to the number of queries NTP implementations cause.  Current NTP implementations trigger DNS queries only during startup, and rarely thereafter to gather more pool servers if some stop responding.  I'm currently testing changes to ntpd to better eject nonresponsive pool servers and servers which are not contributing to the time solution.  At the request of the pool.ntp.org administrator, I will also be adding logic to replace pool servers that are responsive and contributing after a few weeks, to allow server operators who remove a server from the pool to more quickly see the requests taper off.  Miroslav Lichvar has recently implemented the latter change in Chrony.  Even so, I expect Khronos to be generating substantially more queries to pool.ntp.org than ntpd or chrony alone.  If Khronos persists its collection of n server addresses across restarts, it would reduce the number of queries, but it would also contribute to the problem of servers removed from the pool continuing to see queries for years thanks to long-running systems.
 >> As Khoronos’s security results from potentially using more NTP servers, it inherently needs to make more DNS queries than NTP. In cases of frequent restarts, Khoronos pool may need to persist restarts but we can keep it time bounded. Just an idea, perhaps as part of evicting servers from the pool you could add them to a “recent evicted pool” (e.g., notinpool.ntp.org) to notify their current clients (similar to certificate revocation lists).

Section 3.2 "Khronos' Poll and System Processes" describes a process of querying m servers and eventually refers to this as a sampling and sometimes resampling process.  It would be clearer if the first mention in the section of querying m servers were described as a sampling process, rather than introducing the term in the sentence mentioning resampling.
 >> Thanks, following your comment I added an explicit initial sampling reference to clarify the resampling.

Later in the same section there is "Note that whether the client allows panic mode or not is configurable."  This configuration option is not mentioned elsewhere and seems to be discussing a particular implementation of the algorithm, rather than the algorithm itself.
>> You are right, this must have been left from older versions so I removed it.
The Khronos draft refers to a number of variables controlling the algorithm behavior: n, m, w, B, ERR, K, H.  Section 3.3 "Khronos' Recommended Parameters" does not discuss B, ERR, or H.  Section 4.2 gives a default value of 30ms for H.  That default and the reasoning might be better placed in section 3.3.
 >> Thanks for the comment, we fixed it accordingly.

Grammar nit: Section 4.3 "Security Analysis Overview" includes "Therefore, the probability that the attacker repeatedly reach this scenario decreases exponentially, rendering the probability of a significant time shift negligible."  It should be "reaches".  Another nit in that section:  "(with the previous parameters of n=500, m=15, w=25 and k=3)"  The k should be K (uppercase) as elsewhere in the document.
>> Thanks, we fixed it. 

The psuedocode in section 5 has differing comments for the invocations of bi-sided-trim().  I think the second one is correct.
>> Thanks, we fixed it.  

Section 6 "Precision vs. Security" states:"Under attack, Khronos takes control over the client's clock, mitigating the time shift while guaranteeing relatively high accuracy (the error is bounded by H)."
H is described as "Predefined threshold for time offset triggering clock update by Khronos."  H is the threshold for the difference between Khronos' offset estimation and NTP's.  I would like to see an explanation of how it bounds the error of the clock when Khronos is controlling it.
>> Given the analysis showing that taking m random samples provides Khronos with an accurate time estimation with very high probability, if we allow ntpd to change the clock by no more that H (milisec) from Khronos, then (with high probability) we bound the client clock error by H (milisec). 
>> We added a clarification for that in the draft. 
It is unclear to me what is meant by precision as opposed to accuracy in section 6.  In NTPv4, precision has a narrow definition as the time to read the system clock, forming a lower bound on the estimated offset from UTC.
>> We consider here the common definition of precision expressed as consistency. In any case our main point is that in the absence of attack, NTPv4 is used as is and all its properties are preserved.  

Also in section 6 is the recommendation to use Khronos on all hosts in scenarios such as "multi source media streaming."  I'm not familiar with that scenario.  An explanation of it or a footnote link to one would be helpful.  Given the higher load Khronos imposes on the pool.ntp.org DNS and NTP servers, I would hope such a recommendation would be limited to cases where all the hosts are exposed to untrusted attackers.  A more considerate approach would be to have a very small number of servers using Khronos and authenticated NTP between the individual streaming hosts and the servers protected by Khronos.  A nit:  It may be preferable to use "advisable" rather than "advised" in that sentence.
 >> This is a recent addition suggested in this thread less than a month ago by the reviewers in order to address more use cases. I agree with the alternative approach and I added it to the draft.

Assuming there exists a prototype Khronos implementation, I'm disappointed I'm unable to find it published.  I would like to see a pointer to a reference implementation included in the document so that people can see it in action and offer practical feedback on its behavior.
>> Khronos implementation is currently an active project of the NTF which can be easily found (by Google). When the implementation is completed it will be easily accessible from the project page. 

-- 
Cheers,
Dave Hart
davehart@xxxxxxxxx
hart@xxxxxxx

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call