Re: [arch-d] deprecating Postel's principle- considered harmful

Mark Andrews <marka@xxxxxxx> · Wed, 8 May 2019 09:52:42 +1000

The fix for that is to implement the new feature and get customers to log
bug reports against the other implementation.  It does work.  Even Fortune
500 companies listen to their customers.

Implementations can be fixed and with regular update mechanisms in place
the fixes do get deployed.

In early 2018 the open source DNS vendors declared a flag day on Feb 1, 2019.
We where no longer going to treat timeout as “the server doesn’t understand EDNS”.
I.e. we where intending to deliberately break interoperation with servers that
where stuck in the last century.  We set up a site where you could test your
servers to see if your servers where broken.   We also had that site report
a whole lot of other EDNS protocol violation.  You got RED if there was a
fatal fault, ORANGE if any other fault was detected and GREEN if the server
passed the test queries.  The number of sites which would get RED was << 1 in
10000.

We advertised this flag day, in a number of places.  The testing server got
overwhelmed and DNS vendors with broken servers got thousands of “this
site says your product is broken, when are you going to fix it” complaints.
Implementations got fixed and the fixes got deploy.  This included firewall
vendors fixing their default blocking rule.

https://ednscomp.isc.org/ has lots of graphs which show failures of various
populations of servers and how things changed in the lead up to Feb 1 and
later fixes being released and deployed.  In particular
https://ednscomp.isc.org/compliance/ts/au.optfail.html show Microsoft fixing
their servers (see the “echoed” line).  The Jan 28 was when they fixed their
Azure service.  The drop around April 11 is when fix pushed out in the March
patch Tuesday release started to take effect.

The March 1 failure spike was when one of the links from the testing site
started dropping DNS UPD packets but passed DNS TCP packets.  It took a
couple of days to track down the bad (wrong style for the conductor) RJ48
connector that caused that failure.

> On 8 May 2019, at 8:53 am, Henning Schulzrinne <hgs@xxxxxxxxxxxxxxx> wrote:
> 
> It's kind of the inverse robustness problem - lots of perfectly-well documented protocol features are not usable in practice since they break "lazy", but important, implementations. The SIP crowd will remember the endless discussions about MIME multipart.
> 
> In those cases, it would probably be wise to simply declare that a well-intentioned feature is no longer recommended for new implementations and possibly look for alternative options. (We seem to get into the circular arguments of "We need X", "use standard mechanism M to do X", "but none of the major implementations do M and are unlikely to in the foreseeable future", "but we cannot have two ways to do X and M is actually a better idea"; wait two years; repeat. Alternative: people develop a kludge that kind of works.)
> 
> The old, rarely-exercised, promotion-to-Draft-Standard step was supposed to catch this, but that process protocol path obviously was also rarely exercised.
> 
> Henning
>  
> 
> And then for a counter example, also related to IPv6. The IPv6 specs
> allows implementations to insert a variety of intermediate headers
> between the IPv6 header and the transport packet, but many router
> implementations just don't like that and slow down or drop packets if
> such intermediate routers are present. A case where the spec is arguably
> too permissive, or the implementations too strict.
> 
> -- Christian Huitema
> 

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742              INTERNET: marka@xxxxxxx