[Last-Call] Re: [dmarc-ietf] Artart last call review of draft-ietf-dmarc-aggregate-reporting-23

"Daniel K." <daniel@xxxxxxxx> · Wed, 27 Nov 2024 20:24:51 +0000

On 11/27/24 01:41, Martin Thomson via Datatracker wrote:
> Reviewer: Martin Thomson
> Review result: On the Right Track

Thank you Martin for your review. I don't have all the answers, but I'll
try to help where I can.

> # High Level Stuff
> 
> I found this document pretty hard to process.  Part of that is a lack of
> familiarity with mail and its attendant security apparatus (which seem designed
> to defy rational analysis, see also earlier points about centralization). 
> However, I found that the document lacked sufficient overview and
> context-setting to be comprehensible.

I don't know if anyone told you that this is the companion document to
the DMARCbis draft. Many of the 'mystic chants' you write about below
should become clearer with familiarity of that draft too.

> Take Section 2.1 which launches right
> into dense paragraphs of "A begat B; B begat C" like the parts of the christian
> bible that people like to pretend doesn't exist.  Buried in that text are
> incredibly useful pieces of context, like the fact that data is broken down by
> IP address (presumably as observed by the mail receiver).

I read through it and saved some notes.

> The list of fields contains wonderful statements like: Mandatory fields are
> "domain", "p", "sp".  I searched and was unable to find what "p" might contain,
> other than a comment in the XML schema, which appears in an appendix.  If the
> point of this document is to define fields, then it would be best if it
> contained definitions.  Maybe "sp" or "adkim" are obvious to someone versed in
> the minutiae, but without citations and references, the information in the
> draft is far less useful than it could be.  Having the bulk of the
> specification in comments in code in an appendix is not ideal.

This is a smaller prose version of the XSD schema. Also, it has become
slightly incorrect. I'll prepare some fixes, and discuss on the list.

The fields are really defined in dmarcbis 4.7, and those values are used
here.

It is not immediately obvious to me how we can put all the gritty
details in the prose, after all, it is the XSD that defines the format
for the reports.

> It is not clear to me whether the information included in a report is
> aggregated (i.e., counts or similar metrics) or simply a collection of
> per-message details that is gathered into a single report.  That a problem. 
> The title says aggregated and there is mention of counts, but the language in a
> couple of places strongly implies otherwise.

You have the advantage of new eyes. Could you point out which sentences
you think implies no aggregation?

We'd do well to change them if that is the common impression.

> I shouldn't have to guess that
> much.  Maybe this could be managed with a section describing the basic layout
> of the reports.  The example at the end helped a bunch, but I'm inferring a lot
> from that example where it could be spelled out clearly.
> 
> # Mid-sized stuff
> 
> S2.5 says "the Mail Receiver MAY send a short report indicating that a report
> is available but could not be sent" - how?

By email. This is for the case when the report exceeds the announced
size-restriction. Then the Mail Receiver can send a short message saying so.

> S2.5.1 says "The aggregate data MUST be an XML file that SHOULD be subjected to
> GZIP [RFC1952] compression."  Is there a mechanism by which the Domain Owner
> can indicate different compression modes?  That is, is there agility for this?

GZIP is enough for everyone. The hassle of requiring the implementations
to support multiple implementations is just that, a hassle.

We discussed adding a tag to the DNS TXT record to indicate support for
different compression formats, but the consensus was that no-one would
bother with it.

> S2.5.1 says " The aggregate data MUST be present using the media type
> "application/gzip" if compressed (see [RFC6713]), and "text/xml" otherwise." 
> This has two problems: 1. The text/xml form should be a new media type that
> describes the format that this document defines.  Then, you have a hope of
> evolving the format in a non-compatible way at some point in the future. 2. The
> gzip form does not signal the type of the inner document, making any format
> change impossible when compression is involved.

We always try to be compatible as much as possible, any incompatible
future changes to the format could be handled with an increase of the
value in the "version" element in the report, or by changing the xmlns.

> It's not clear to me that the strict rules regarding the construction of
> filenames and subjects is justified, especially when the report contains the
> same information.  Can you design a single system for carrying the necessary
> information?  (I get that you might want to use something like Subject to
> ensure routing to the right subsystem, but maybe limit the amount that you need
> to specify to achieve that purpose only.

It is helpful to have unique filenames, e.g. if the reports are stored
in a directory, before processing.

> S2.5 (general) Why is it the responsibility of the transport mechanism to
> detect duplicates?  Can a unique identifier be added to the content of the
> report?

It's not, duplicate detection is done on the receiving end, while
processing the report.

The unique id, defined in 2.5.1, is put in the XML "report_id" element
as well.

  This identifier MUST be unique among reports to the same
  domain to aid receivers in identifying duplicate reports
  should they happen.

Or maybe I misunderstand your use of transport mechanism.

> S3 defines a validation process that involves querying DNS at "<provider
> name>._report._dmarc.<target name>".  This will fail when this string is too
> long, which is pretty easy to manage for an attacker.  That's an unrecoverable
> error, but the procedure says nothing about that error.  Does that make certain
> reporting architectures impossible for some providers?

The attacker in this scenario is whoever is in control of the DNS for
the author domain. If you are sending e-mail from an exceptionally long
domain name, do not use a third party to handle your DMARC reports.

> I'm not enthusiastic about the privacy considerations.  Whose privacy is
> affected by leakage (S6.3)?
> 
> The schema uses xs:string for string types, which means that whitespace is
> significant.  I generally advise people to use xs:token instead so that content
> can be authored safely, though with the automated nature of this format, this
> is unlikely to be a significant factor.
> 
> The schema defines a number of enums, which seem like they might be problematic
> if you ever need to extend the value space.  I'm looking at DKIMResultType and
> SPFResultType as prime examples of something that might need to be extended. 
> In these case, I generally recommend xs:token as well, pointing at a registry
> for the valid values.

> The schema definition for TestingType is a boolean, but it doesn't use xs:bool.
>  Why?  Same for DMARCResultType.

This mirrors the values defined for the "t" tag in dmarcbis.
The schema is adapted from RFC 7489, and is minimally adjusted for
compatibility reasons.

> Why do the dates in the format not use the XML xs:dateTime construction?

We do not use dates, directly, but seconds since epoch, which is easier
to work with, programmatically.

> Why is there a specific <extension> container at the top level, but not for
> each record?  I would have thought that extending in the same way for each
> would be better.

Good question, but does it matter?
Right now it's defined to be in it's own XML namespace only.
We'll think about it.

> # Small stuff
> 
> S1 mentions terminology that might be better moved to S1.1
> 
> S2.2 says "There MAY be optional sections for extensions within the document."
> <- this is not a "MAY", this is either an "is" or "is not" (I'm guessing "is").

New:
   The document format supports optional elements for
   extensions.  The absence or existence of such elements
   SHOULD NOT create an error when processing reports.
   This will be covered in a separate section.

> A few lines in the appendix are too long for the RFC format (I see one at 75
> characters).

I'll send patches.

> In the acknowledgments, this looks like a serious error: "Kvå (U+00E5)l".  Are
> you using <u>?

My name is... interesting to deal with, wrt. foreigners. Hence the From
header: "Daniel K." The name may be transliterated as "Kvaal" when the
ASCII gods are asserting their will.

I see the ball is rolling on the dmarc list regarding some of the things
I did not address here. Hopefully better answers will be forthcoming.

Daniel K.

-- 
last-call mailing list -- last-call@xxxxxxxx
To unsubscribe send an email to last-call-leave@xxxxxxxx