Re: IETF Policy on dogfood consumption or avoidance - SMTP version

Keith Moore <moore@xxxxxxxxxxxxxxxxxxxx> · Wed, 18 Dec 2019 18:03:17 -0500



    On 12/18/19 3:12 AM, Eliot Lear wrote:

    
      On the general issue, as the late Brian Kantor said,
        RFCs serve us best when they document existing practice.  

      
    I do not think this is defensible as a general statement, at
      least not when stated in that way.   There are cases when this is
      true, such as when existing practice has evolved to figure out
      what actually works well, and RFCs have evolved to document it.  
      But that neglects the very purpose of engineering of our
      protocols, which is to determine what will work well before it
      becomes existing practice.   Engineering is imperfect, of course,
      which is why we revise those RFCs in light of experience, but good
      engineering produces good practice much more often than not.   

    
    Sometimes our first RFC documents a protocol that is already in
      use, in which case we have a dilemma - should the RFC document
      existing practice or should it document what is believed to be
      good practice?   I have seen many instances in which this conflict
      produces poor specifications, in which either documenting existing
      practice, or specifying what is believed to be good practice,
      would produce a more useful result than trying to do both.   It is
      imperative that the WG in such a case have clear instructions as
      to which it should do.   But it's not always the case that the WG
      should favor existing practice.
    Another problem with that statement is that, if taken as true on
      its face, _any_ existing practice, no matter how poorly chosen or
      rarely employed, can be used to argue for ignoring an RFC or
      changing text in the RFC when it is revised.

    
    In the case of SMTP, of course, many changes made between RFC 821
      and RFC 5321 reflect lessons learned in the ~26 years of
      experience with SMTP during that interval, just as RFC 821
      reflected several years of experience with email-over-FTP prior to
      the invention of SMTP.    The rule in RFC 5321 that permits an IP
      address literal in HELO/EHLO is one such example.   Servers that
      rejected messages based on HELO/EHLO were found to frequently
      reject valid mail.   Perhaps unfortunately, RFC 5321 only reflects
      the change learned from that experience and does not record the
      experience that led to that decision - so people today may naively
      believe that the language in 5321 is a mistake or accident.   It
      was neither of those.
    Email usage does continue to change, so a decision should not be
      taken as valid forever just because it was once sound.   This is
      equally true, BTW, for the language in RFC 5321, as it is for an
      operational change to reject EHLO with IP address literals.

    
    I do support empowering operators, in exigent circumstances, to
      do whatever appears to be necessary to allow successful operation
      to continue.  But the process should not stop there.   When the
      effect of such decisions accumulates over time, and the corrective
      measures continue to be justified because "they've been that way
      forever", the result is generally to degrade the ability of the
      network to support applications and changing usage patterns.   So
      IMO there's a need to document the reasons for such measures, and
      to measure and track their effectiveness over time.   And because
      this check happens before the message content is actually
      transmitted, the only way to measure effectiveness of a rejection
      on EHLO arguments is to let some statistically representative
      sample of such messages through from time to time and actually
      measure how valid a check it turns out to be.   

    
      This is entirely in accord with the “running code”
        part of our mantra: if network needs dictate that the RFC not be
        followed to the letter, then so be it.  I would actually like that we are demonstrating this point and
        reenforcing that our standards are voluntary, but for the fact
        that I don’t think the standard was substantially violated.
         That’s my second point.
      
    
    This is not "running code" in the sense that we usually use the
      term.   It's not a change made by a well-supported SMTP server.  
      It's a configuration change made by one site for which very little
      evidence to support it has been cited.   Even then, the assertion
      was that this configuration was more efficient, not that it
      resulted in a better overall ham-to-spam ratio.   And of course
      IETF is not just an ordinary site, it sets (or should set) an
      example for others to follow.  
    
      
      If one looks at RFC 5321, there are three reasons to
        think that really the standard does not prohibit the behavior
        being described.  First, the text in 4.1.4 doesn’t actually
        prohibit the rejecting literals.  Here is what is said:
      

             If the EHLO command is not acceptable to the SMTP server,
            501, 500,
          
            502, or 550 failure replies MUST be returned as appropriate.
      
      
        At this point in the transaction, it may not be readily
          apparent that the EHLO indeed is unacceptable.  That’s because
          the server may need to gather more information, such as the
          recipient, in order to check against SPF records or other
          inputs delivered later to make a decision.
      
    
    I disagree.   The EHLO argument is immaterial to any check on
    validity of a recipient address.   And one subtlety of SMTP is that
    a response to a RCPT TO command is generally taken as a response for
    that specific recipient, so it's generally not appropriate to
    complain about EHLO in response to RCPT TO.    

    
             An SMTP server MAY verify that the domain name
              argument in the EHLO
             command actually corresponds to the IP address
              of the client.
             However, if the verification fails, the server
              MUST NOT refuse to
             accept a message on that basis.
        
        
        That “MUST NOT” conflicts with the most important principle
          we have to go with in order to defend against spam and
          malware, as clearly stated in Section 7.9:
        

            It is a well-established
            principle that an SMTP server may refuse to
          
            accept mail for any operational or technical reason that
            makes sense
          
            to the site providing the server.
      
    
    The MUST NOT exists for the very reason that rejection of
      messages because of EHLO arguments was found to be detrimental to
      interoperability, because too many SMTP servers were conducting
      inappropriate checks.
    
      
        And so the server chose to reject a message.  I would add
          that an erratum should be filed to resolve this conflict.
      
    
    I disagree.   Or at least, I don't think IETF should dismiss the
      many person-decades of experience that went into RFC 5321, based
      on a single assertion by one operator that they found it
      beneficial to do so.   I do think IETF should probably research
      the matter, but I'm not sure that an erratum is appropriate.  

    
        Finally, we cannot CANNOT CANNOT micromanage secretariat
          and mailing list operations on the IETF list. 
      
    
    I mostly agree, but this is still a red flag.   When the
      secretariat or operators believe they see a need to violate IETF
      consensus, they should at least explain to IESG why they are doing
      so, and IESG should refer the matter to the appropriate area or
      working group for investigation.    Partially because this is
      potentially valuable feedback for IETF standards, but also because
      it reflects poorly on our organization and impairs the perceived
      value of our standards when we refuse to eat our own dog food.   

      
      And it should be fine to discuss these things on the IETF list or
      on an appropriate topical IETF mailing list.   But neither the
      IETF list nor an IETF WG should not be trying to issue any kind of
      immediate instructions to IETF operations personnel, unless asked
      to do so by the IESG.

    
        This does NOT impact open participation, when anyone can
          get free email service on any number of platforms that work
          just fine with the IETF.  That this message didn’t meet the
          extremely low barrier of setting up a PTR record when > 99%
          of SMTP client connections are best classed as attacks.  

        
    As I understand the situation, that wasn't the problem.  
      Messages were/are being rejected merely because they used IP
      address literals in EHLO, whether or not there was a PTR record.
    
      
        That THAT isn’t recognized as THE problem the IETF should
          work on with regard to email is disturbing.
      
    
    To me the disturbing thing is that so many operators feel
      empowered to impose arbitrary and often meaningless checks on
      email in the name of filtering supposed spam, checks that are
      quite often not backed up by careful and continued measurement and
      appropriate use of statistics.    But again, this may be due to a
      failure of IETF to make appropriate protocol changes and/or
      operational recommendations to facilitate more accurate spam
      filtering.    IMO the "operators can do whatever they want" idea
      has degraded interoperability for far too long, but we can't
      really expect operators to do better unless we give them better
      tools.
    Keith