[Last-Call] Artart last call review of draft-ietf-dmarc-aggregate-reporting-23

Martin Thomson via Datatracker <noreply@xxxxxxxx> · Tue, 26 Nov 2024 17:41:03 -0800

Reviewer: Martin Thomson
Review result: On the Right Track

This document defines a mechanism for reporting the status of received mail to
the logical originator of that mail. This helps the originator understand how
the mail that they ostensibly produce, which might be distributed and
delegated, is being received.  This is a useful feedback system for mail
providers.

I want to note that a system like this seems specifically designed to reinforce
existing inequities in the mail system.  Only large actors will be able to
deploy the necessary resources and sophistication to make use of this sort of
feedback.  That's unavoidable and the system is nonetheless a useful thing to
have, but it is worth acknowledging that in building this we contribute more to
the centralization of an already-centralized system.  I don't know if this
rises to the level of notability in the draft, but I wanted to acknowledge that
here.

# High Level Stuff

I found this document pretty hard to process.  Part of that is a lack of
familiarity with mail and its attendant security apparatus (which seem designed
to defy rational analysis, see also earlier points about centralization). 
However, I found that the document lacked sufficient overview and
context-setting to be comprehensible.  Take Section 2.1 which launches right
into dense paragraphs of "A begat B; B begat C" like the parts of the christian
bible that people like to pretend doesn't exist.  Buried in that text are
incredibly useful pieces of context, like the fact that data is broken down by
IP address (presumably as observed by the mail receiver).

The list of fields contains wonderful statements like: Mandatory fields are
"domain", "p", "sp".  I searched and was unable to find what "p" might contain,
other than a comment in the XML schema, which appears in an appendix.  If the
point of this document is to define fields, then it would be best if it
contained definitions.  Maybe "sp" or "adkim" are obvious to someone versed in
the minutiae, but without citations and references, the information in the
draft is far less useful than it could be.  Having the bulk of the
specification in comments in code in an appendix is not ideal.

It is not clear to me whether the information included in a report is
aggregated (i.e., counts or similar metrics) or simply a collection of
per-message details that is gathered into a single report.  That a problem. 
The title says aggregated and there is mention of counts, but the language in a
couple of places strongly implies otherwise.  I shouldn't have to guess that
much.  Maybe this could be managed with a section describing the basic layout
of the reports.  The example at the end helped a bunch, but I'm inferring a lot
from that example where it could be spelled out clearly.

# Mid-sized stuff

S2.5 says "the Mail Receiver MAY send a short report indicating that a report
is available but could not be sent" - how?

S2.5.1 says "The aggregate data MUST be an XML file that SHOULD be subjected to
GZIP [RFC1952] compression."  Is there a mechanism by which the Domain Owner
can indicate different compression modes?  That is, is there agility for this?

S2.5.1 says " The aggregate data MUST be present using the media type
"application/gzip" if compressed (see [RFC6713]), and "text/xml" otherwise." 
This has two problems: 1. The text/xml form should be a new media type that
describes the format that this document defines.  Then, you have a hope of
evolving the format in a non-compatible way at some point in the future. 2. The
gzip form does not signal the type of the inner document, making any format
change impossible when compression is involved.

It's not clear to me that the strict rules regarding the construction of
filenames and subjects is justified, especially when the report contains the
same information.  Can you design a single system for carrying the necessary
information?  (I get that you might want to use something like Subject to
ensure routing to the right subsystem, but maybe limit the amount that you need
to specify to achieve that purpose only.

S2.5 (general) Why is it the responsibility of the transport mechanism to
detect duplicates?  Can a unique identifier be added to the content of the
report?

S3 defines a validation process that involves querying DNS at "<provider
name>._report._dmarc.<target name>".  This will fail when this string is too
long, which is pretty easy to manage for an attacker.  That's an unrecoverable
error, but the procedure says nothing about that error.  Does that make certain
reporting architectures impossible for some providers?

I'm not enthusiastic about the privacy considerations.  Whose privacy is
affected by leakage (S6.3)?

The schema uses xs:string for string types, which means that whitespace is
significant.  I generally advise people to use xs:token instead so that content
can be authored safely, though with the automated nature of this format, this
is unlikely to be a significant factor.

The schema defines a number of enums, which seem like they might be problematic
if you ever need to extend the value space.  I'm looking at DKIMResultType and
SPFResultType as prime examples of something that might need to be extended. 
In these case, I generally recommend xs:token as well, pointing at a registry
for the valid values.

The schema definition for TestingType is a boolean, but it doesn't use xs:bool.
 Why?  Same for DMARCResultType.

Why do the dates in the format not use the XML xs:dateTime construction?

Why is there a specific <extension> container at the top level, but not for
each record?  I would have thought that extending in the same way for each
would be better.

# Small stuff

S1 mentions terminology that might be better moved to S1.1

S2.2 says "There MAY be optional sections for extensions within the document."
<- this is not a "MAY", this is either an "is" or "is not" (I'm guessing "is").

A few lines in the appendix are too long for the RFC format (I see one at 75
characters).

In the acknowledgments, this looks like a serious error: "Kvå (U+00E5)l".  Are
you using <u>?

-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx