Re: [Last-Call] [Ext] Dnsdir last call review of draft-ietf-dnsop-dns-error-reporting-06

Roy Arends <roy.arends@xxxxxxxxx> · Mon, 6 Nov 2023 13:44:20 +0000

Thanks Dave,

Comments inline.

> On 5 Nov 2023, at 15:04, David Lawrence via Datatracker <noreply@xxxxxxxx> wrote:
> 
> Reviewer: David Lawrence
> Review result: Ready with Issues
> 
> Hi Roy and Matt, this is my review on behalf of the DNS Directorate.
> 
> There are only a couple of minor substantive points that I think
> really should be addressed before publication, and then my usual wordy
> musings on a number of other nits.
> 
> Minor substantive points:
> 
> Section 4 doesn't really describe which responses should get the
> Report-Channel option pushed.  Obviously it requires EDNS0, and I'd
> guess I don't want to include it for unsigned responses. I'm also
> guessing I'd not send it if DO were clear, and that I probably would
> send it for any other DNSSEC-related query including for DNSSEC meta
> records.

Some errors are not related to DNSSEC.

> As an implementer, I'd prefer to not have to guess at those things and
> rather be told when I should be including it.  Apparently in section 6.2.

Why would you, as an implementor, guess? 

If the authoritative server is configured to include the EDNS0 Report-Channel option, it should include the EDNS0 Report-Channel option, no?

I rather not dictate local policy in this draft but leave it for an eventual BCP, when we have a set of current practice to select the best from.

> Also, for Section 5, is 0 an okay OPTION-LENGTH

No.

> or must it be minimum
> 1 with an AGENT-DOMAIN of \0?  

No.

> Similarly, in 6.1 there is "returned an
> empty agent domain", which is described more expansively in the
> Overview as "empty, or is the null label" and suggests 6.1 should be
> clearer on what constitutes an "empty" agent domain.

I’ve added the word “field” to make it clear. It now reads:

The reporting resolver MUST NOT use DNS error reporting if the
authoritative server returned an empty agent domain field in the
EDNS0 Report-Channel option.

In section 5, it now reads:

AGENT DOMAIN:  A fully qualified domain name [RFC8499] in
      uncompressed DNS wire format.  This field MUST NOT be empty, or
      contain the null label.

> Bag o' Nits:
> 
> Hyphenate adjectival phrase:
> s/stale DNSSEC signed zone/stale DNSSEC-signed zone/.

fixed.

> For the Terminology section, perhaps start with something like, "DNS
> Terminology used in this document is from [BCP219], with these
> additions:" Because, for example, the first item, "reporting
> resolver", uses "validating recursive resolver" to define it.  (I'll
> note, however, that BCP 219 defines "validating resolver" but not
> "validating recursive resolver".  I'm not sure that "recursive" is a
> strictly necessary qualifier here.) (Also, I don't remember what the
> IETF style guidance is on reference by BCP# or by RFC#, and you did
> include BCP219 as RFC8499 later on.)

Fixed. I’ve added the following line:

"DNS Terminology used in this document is from BCP 219 [RFC8499], with these additions:”

And I removed the “recursive” qualifier to comply with BCP 219

> s/wireformat/wire format/g per usage in BCP 219.

fixed.

> "The reporting resolver builds this QNAME by concatenating the _er
> label" has _er unquoted here, but quoted later in the same sentence.
> They should be consistent; I favour the quotes.  But also, the actual
> algorithm is specified in 6.1.1, so restating it here without even
> referencing that 6.1.1 is the authoritative definition is not ideal
> and perhaps should just be "... builds this QNAME by the algorithm in
> Section 6.1.1."
> Relatedly, though 4.2 spells it out, as I was reading this paragraph I
> wanted to see an example right away.  Because you said "label" as an
> old DNS geek I could infer you meaxnt "prepend _er with a dot" but I
> did first have the thought flash through my mind about whether the dot
> was there.

> Maybe I'm odd that way, but scrolling down and back to satisfy the
> visualization is slightly more cognitive work.  "See the example in
> Section 4.2" could be something like.  "This results in a name like
> _er.1.broken.test.7._er.a01.agent-domain.example., as constructed in
> the example of section 4.2."

I’ve added:
This results in a name like _er.1.broken.test.7._er.a01.agent-domain.example., as constructed in the example of Section 4.2 and via the alogorithm in Section 6.1.1.

> "The report query will ultimately arrive at the monitoring agent" is
> kind of odd to me, not because it isn't true but because it seems like
> a basic property of the DNS that we've been relying on for 40 years.
> The first three sentences could be shortened up a bit to something
> like, "The monitoring agent's reply to the report query MUST be cached
> by the reporting resolver."  MUST describes it as essential, and the
> remaining sentences describe why.

Agreed. I’ve changed that to:

The response from monitoring agent to the report query MUST be cached by the reporting resolver. 

> I'd like to see some recommendations for a suitable TTL on the
> reporting agent's reply.

I don’t. For some, an hour is too long. For others, a day is too short. It depends entirely on the operational model, or can even rely on the type of error received, and domain involved. Operational guidance (maybe a BCP) is something that can be build once there is actually some operational experience.

>  Also, why no guidance on the TXT rdata?
> Even something like "it could be as short as a null record to minimise
> cache overhead, or could contain additional information that the
> authority wishes to communicate to the resolver."

See my operational guidance comment above.

> For "This QNAME indicates extended DNS error 7", append ", Signature
> Expired, " as a useful appositive for the reader.

Added.

> The last paragraph of section 4.2 seems overlong. The first
> parenthetical is an unnecessary restatement of Terminology. "The agent
> can determine" sentence basically restates the last sentence of the
> prior paragraph. Finally "The monitoring agent can contact the
> operators" sentence, while it does explicitly recognize that the
> operator can be a different entity, reads a little odd for what I
> would expect to be the most common case of monitor and operator being
> the same entity.  It also isn't clear that it adds anything that isn't
> intrinsically obvious, that the report could then be the trigger for
> fixing the problem.  It looks to me like the whole last paragraph could
> be dropped with no important loss of meaning.

I’ll leave it in. It is part of an example that imho needs to be a complete process, including an illustration of what can be done with such a report.

> "The DNS class is not specified in the error report."  Huh, so, yeah,
> that's interesting.  I honestly hadn't even though about class until
> this moment.  Maybe "SHOULD assume IN" is a worthwhile addition
> somewhere, since that's nearly the entirety of DNS traffic, and the
> only class for which DNSSEC is currently defined?  (I mean, if you
> want to start signing CHAOS TXT and reporting on failures, we've got a
> lot of other work to do.)

ok.

> "The reporting resolver MUST NOT use DNS error reporting to report a
> failure in resolving the report query." This feels ambigous to me,
> because even as an old DNSSEC geek I would, in the vernacular,
> describe a failure to validate as a failure to resolve.  Short example
> phrases of what sort of thing you don't want to see happen would be
> good.

No.

The point is that when you send an error report, and there is a resolution failure, you want to avoid sending an error report on the resolution failure, simply to avoid the obvious potential for looping. Nothing ambiguous about that.

> 6.1.1, "When the specified agent domain is empty, or a null label
> (despite being not allowed in this specification), the report query
> will have "_er" as a top-level domain as a result and not the original
> query."  Is this sentence useful?  I can see it as a bit of a
> rationale for why to have a second _er, but the first sentence already
> provides sufficient rationale, and an empty/root agent domain already
> means the rest of this section is irrelevant.

This is indeed a rationale. I’ll leave it in for completeness.

> 6.3 I'd put RECOMMENDED TTL range here, as just how often the agent
> wants to keep hearing about failures for the same problem, and adding
> consideration for just how many resolvers could be responding for how
> many q-tuples.  Even just magnitude would help.  (Me?  In the absence
> of any guidance I'd probably go with 15 minutes but could see anything
> from 5 to 60 being pretty reasonable.)

Nope, no recommendation on TTL here. I’ve touched upon this above.

> s/Well known addresses/Well-known addresses/

Fixed, thanks!

Thanks Tale! I really appreciate your thorough analysis.

Warmly,

Roy
<<attachment: smime.p7s>>
-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call