Digression on DANE for MTAs implementation difficulty, followups off-list are likely best.

Viktor Dukhovni <ietf-dane@xxxxxxxxxxxx> · Wed, 11 Apr 2018 21:35:56 -0400

> On Apr 11, 2018, at 11:40 AM, ned+uta@xxxxxxxxxxx wrote:
> 
> I've also looked at implementing DANE, and IMO it's a major PITA to implement,
> so much so that it would take substantial customer interest to make me do it -
> interest that has not materialized.

[ Not really on topic for this LC, so follow-ups off-list if there are further
  questions to either Ned or me. ]

If your TLS library is OpenSSL 1.1.0 or later, then DANE support is included,
all you need to do is locate and retrieve any associated usable TLSA records,
and OpenSSL will verify the peer chain against those.  Since MTAs already have
DNS-specific code for MX records, ... also doing TLSA lookups is fairly simple.

Manpage with code sample at:

  https://www.openssl.org/docs/man1.1.0/ssl/SSL_CTX_dane_enable.html

At which point DANE boils down to, per "MX host":

  * Obtain address (A/AAAA) records for hostname via DNS
  * If the address records are insecure and traversed a CNAME
    perform CNAME query to obtain security status of original name
  * If all (address and any CNAME) are insecure, no DANE
  * If address records at are secure post CNAME expansion, check
    for secure TLSA RRs at _25._tcp + expanded hostname. If found,
    use those.
  * If none found, or only original name is secure check for
    TLSA RRs at _25._tcp + original hostname.  If found use
    those, else no DANE.

If TLSA lookups error out (not NXDOMAIN but SERVFAIL, timeout, ...)
then skip the host and try next MX or defer.

There's no HTTPS callout, no persistent cache, no downgrade on first
contact.  No periodic preemptive policy refresh (to avoid downgrade
near expiration), no expedited policy refresh on authentication
failure, ...   A correct STS implementation has rather a lot of
delicate persistent state.  And the SMTP client needs to not only
do policy lookups against its cache, but also trigger policy
refresh on connection failure, and policy confirmation on initial
success.  It is also not 100% clear whether a policy which
initially fails on the first MX host tried, but succeeds on the
second is a valid policy to cache.  I think the intent is that it
is, but clearly the domain's configuration is degraded, so it is
a bit of a judgement call.

The policy cache needs to avoid retrying too often on failure to
reach the HTTPS server, returning the status quo rather than
timing out the lookup at each delivery.  So there's short-term
state about recent failed attempts, as well as long-term state
for cached policies, ...  So no shortage of moving parts.

Each of DANE and STS has its implementation challenges.  It
rather depends on one's comfort zone.  The DANE footprint in
Exim is quite small, not counting the contributed general-purpose
library for X.509 ala DANE that predates support for same in
OpenSSL 1.1.0, and works with OpenSSL 1.0.0 or later.

In Postfix the DANE code is larger, because that library
was originally developed as part of Postfix, and because we
also used the DANE code to re-implement the pre-existing
support for "fingerprint" based peer authentication, and
support TLS session resumption, which has to ensure that
resumed sessions match the destination policy.

So if Exim is a guide to implementation complexity for
DANE, the DANE story looks pretty good.  The amount of
DANE-specific Exim code is quite modest.

-- 
	Viktor.