Hi David, On Mon, Jun 26, 2017 at 3:04 PM, Black, David <David.Black@xxxxxxxx> wrote: > Adding some comments on ECN and DSCP ... > >> > Section 4.3: >> > >> > TRILL over IP implementations MUST support setting the DSCP value in >> > the outer IP Header of TRILL packets they send by mapping the TRILL >> > priority and DEI to the DSCP. They MAY support, for a TRILL Data >> > packet where the native frame payload is an IP packet, mapping the >> > DSCP in this inner IP packet to the outer IP Header with the default >> > for that mapping being to copy the DSCP without change. >> > >> > I think it is fine to require that implementations are capable of setting >> > DSCP values on the outer IP header. However, I fail to see any discussion of >> > the potential issues with actually setting the DSCP values. It is one thing to >> > do this in an IP back bone use case where one can know and have control >> > over the PHB that the DSCP values maps to. But otherwise, over general >> > internet the >> > behavior is not that predictable. One can easily be subject to policers or >> > remapping. Also as the actual DSCP code point usage is domain specific this is >> > difficult. Priority reversal is likely the least of the problems that this can >> > run into over general Internet. >> >> It sounds like appropriate discussion and warnings about these issues >> would resolve the above comment. > > For ECN, see RFC 6040 and draft-ietf-tsvwg-rfc6040update-shim. In particular, > copying the inner ECN codepoint to the outer IP header encapsulation without > requiring decapsulation processing as specified in RFC 6040 or the 6040update-shim > draft can lose congestion indications from the network and hence is wrong > (it's also wrong wrt RFC 3168, but RFC 6040 and the 6040update-shim drafts are > better and more current references). That's a good point. > For DSCPs, start with RFC 2983 - thinking about the validity (or likely validity) > of the outer DSCP at the decapsulator may help in choosing whether to > recommend a uniform model (e.g., copy DSCP out at ingress, copy back in at > egress) or a pipe model (e.g., do something reasonable for outer DSCP at > ingress, ignore it on egress) as the implementation default. I believe the default behavior in the current draft is the best default. That sets DSCP based on the same TRILL Header indicia that controls default QoS on non-IP links. > -- DSCP mapping to/from TRILL/Ethernet priorities > >> The intent in the draft is to reflect the default relative priority of >> the different priority code points in IEEE Std 802.1Q where priority 1 >> is lower than priority 0. At a quick look, it appears to me that RFC >> 2474 requires that 0x001000 be handled as being of a priority not >> lower than the priority with which 0x000000 is handled. Yet RFC 3662, >> which you point to, seems to suggest using 0x001000 as a lower >> priority code point than 0x000000. Given that 3662 not only does not >> update 2474 but is only Informational while 2474 is Standards Track, I >> would say that 2474 dominates and that this draft makes the best >> assumptions it can about default behavior... > > Well ... that's a discussion about text in RFCs that are well over a decade > old, and in an area (less-than-best-effort service) where the aspirations > of at least RFC 3662 weren't realized ... but that RFC is not safe to ignore, > either. > > In practice, the specification of CS1 for less-than-best-effort service has > been promulgated by RFC 4594 rather than RFC 3662, and RFC 4594 has > had significant "running code" impact on network design and operation. > > As Magnus mentioned RFC7657, I strongly suggest starting from the > RFC 7657 discussion of this topic in order to figure out what to do. I'm > not sure what to recommend, but I do think that starting from > RFC 7657 (rather than RFC 2474 and RFC 3662) is the better approach. OK. > FWIW, the TSVWG WG is in the process of figuring out which DSCP > to recommend for less-than-best-effort-service in place of CS1 - that's > likely to be an active topic of discussion in Prague. I'll try to attend that session. Thanks, Donald =============================== Donald E. Eastlake 3rd +1-508-333-2270 (cell) 155 Beaver Street, Milford, MA 01757 USA d3e3e3@xxxxxxxxx > Thanks, --David > >> -----Original Message----- >> From: Tsv-art [mailto:tsv-art-bounces@xxxxxxxx] On Behalf Of Donald >> Eastlake >> Sent: Sunday, June 25, 2017 8:07 PM >> To: Magnus Westerlund <magnus.westerlund@xxxxxxxxxxxx> >> Cc: tsv-art@xxxxxxxx; draft-ietf-trill-over-ip.all@xxxxxxxx; IETF Discussion >> <ietf@xxxxxxxx>; trill@xxxxxxxx >> Subject: Re: [Tsv-art] Tsvart early review of draft-ietf-trill-over-ip-10 >> >> Hi Magnus, >> >> Thanks for the extensive review. See my responses below. >> >> On Thu, Jun 15, 2017 at 1:32 PM, Magnus Westerlund >> <magnus.westerlund@xxxxxxxxxxxx> wrote: >> > >> > Reviewer: Magnus Westerlund >> > Review result: Not Ready >> > >> > Early review of draft-ietf-trill-over-ip-10 >> > Reviewer: Magnus Westerlund >> > Review result: Not Ready >> > >> > TSV-ART review comments: >> > >> > I have set this to not ready as there are several issues, some significant that >> > could affect the protocol realization significantly. Some may be me missing >> > things in TRILL, I was not that familiar with it before this review and I have >> > only tried looking up things, not reading the whole earlier specifications. So >> > don't hesitate to push back and provide pointers to things that can resolve >> > issues. The authors and the WG clearly have thought about a lot of issues >> and >> > dealt with much already. >> >> OK. Hopefully we can resolve these one way or the other. >> >> > Diffserv usage >> > -------------- >> > >> > Section 4.3: >> > >> > TRILL over IP implementations MUST support setting the DSCP value in >> > the outer IP Header of TRILL packets they send by mapping the TRILL >> > priority and DEI to the DSCP. They MAY support, for a TRILL Data >> > packet where the native frame payload is an IP packet, mapping the >> > DSCP in this inner IP packet to the outer IP Header with the default >> > for that mapping being to copy the DSCP without change. >> > >> > I think it is fine to require that implementations are capable of setting >> > DSCP values on the outer IP header. However, I fail to see any discussion of >> > the potential issues with actually setting the DSCP values. It is one thing to >> > do this in an IP back bone use case where one can know and have control >> over >> > the PHB that the DSCP values maps to. But otherwise, over general >> internet the >> > behavior is not that predictable. One can easily be subject to policers or >> > remapping. Also as the actual DSCP code point usage is domain specific this >> is >> > difficult. Priority reversal is likely the least of the problems that this can >> > run into over general Internet. >> >> It sounds like appropriate discussion and warnings about these issues >> would resolve the above comment. >> >> > Section 4.3: >> > >> > The default TRILL priority and DEI to DSCP mapping, which may be >> > configured per TRILL over IP port, is an follows. Note that the DEI >> > value does not affect the default mapping and, to provide a >> > potentially lower priority service than the default priority 0, >> > priority 1 is considered lower priority than 0. So the priority >> > sequence from lower to higher priority is 1, 0, 2, 3, 4, 5, 6, 7. >> > >> > TRILL Priority DEI DSCP Field (Binary/decimal) >> > -------------- --- ----------------------------- >> > 0 0/1 001000 / 8 >> > 1 0/1 000000 / 0 >> > 2 0/1 010000 / 16 >> > 3 0/1 011000 / 24 >> > 4 0/1 100000 / 32 >> > 5 0/1 101000 / 40 >> > 6 0/1 110000 / 48 >> > 7 0/1 111000 / 56 >> > >> > This appear to be an problematic mapping. At least for prio 0 and 1. As >> > priority 1 appears to be intended to be higher than priority 0, it is >> > interesting that it is mapped to CS1, which to quote >> > https://datatracker.ietf.org/doc/rfc7657/: >> > >> > CS1 ('001000') was subsequently designated as the recommended >> > codepoint for the Lower Effort (LE) PHB [RFC3662]. >> > >> > So what is proposed can in a network using default mapping, result in that >> you >> > get priority 0 to be lower priority than 1. Plus that in some networks this can >> > also results in strange remapping that results in a different PHB for CS1 >> than. >> >> The intent in the draft is to reflect the default relative priority of >> the different priority code points in IEEE Std 802.1Q where priority 1 >> is lower than priority 0. At a quick look, it appears to me that RFC >> 2474 requires that 0x001000 be handled as being of a priority not >> lower than the priority with which 0x000000 is handled. Yet RFC 3662, >> which you point to, seems to suggest using 0x001000 as a lower >> priority code point than 0x000000. Given that 3662 not only does not >> update 2474 but is only Informational while 2474 is Standards Track, I >> would say that 2474 dominates and that this draft makes the best >> assumptions it can about default behavior... >> >> > MTU and Fragmentation >> > --------------------- >> > >> > I think there are two main issue here. The first one is MTUD discovery >> > of the actual IP path MTU between the ports. That will be needed to >> prevent >> > a lot of traffic going into MTU black holes. Especially as TRILL requries >> > 1470 byte support which is likey above a lot of paths. >> >> Seems like it would depend on the environments where TRILL was used. >> For example, I do not think 1470 would be a problem in most Data >> Center or Internet Exchange point uses, for example. Data Centers >> sometimes support 9K jumbo frames and the like. >> >> In fact, it is probably bad to focus too much on 1470 -- that is a >> required minimum to be sure that reasonable size link state PDUs can >> be successfully flooded through the TRILL campus so that routing will >> work. However, it would commonly be the case that, for the TRILL >> campus to be useful in a particular case, links need to be able to >> carry the expected size TRILL Data packets. For example, if there were >> two parts of a TRILL campus connected by one or a few TRILL over IP >> links and the end stations in each part were assuming they could use >> 1500 byte Ethernet packets, then the TRILL over IP links would need to >> support an MTU based on 1500 + TRILL Header + IP and TRILL over IP >> encapsulation. And more if security was being used or there were any >> other reasons for additional headers/encapsulation... >> >> > Section 8.4: >> > >> > Path MTU discovery [RFC4821] should be useful >> > in determining the IP MTU between a pair of RBridge ports with IP >> > connectivity. >> > >> > The issue with RFC4821 is that it has requirements on the packetization >> layer. >> > Trill appears to have several components that are useful. However, it will >> > require a specification of the procedure to result in a useful tool. >> >> See below. >> >> > Section 8.4: >> > >> > TRILL IS-IS MTU PDUs, as specified in Section 5 of [RFC6325] and in >> > [RFC7177], can be used to obtain added assurance of the MTU of a >> > link. >> > >> > Yes, that can confirm working MTUs that are at 1470 or above, but appears >> > prevented from working below 1470? >> >> While there is a minimum size for TRILL IS-IS MTU PDUs, determined by >> header size, it is well below 1470, probably (depending on whether >> secuirty is in use, etc.) below 150 bytes. >> >> > Thus, it appears that there is a lack of mechanism here to actually get a valid >> > and functional MTU from TRILL in the cases where the Path MTU is below >> 1470. If >> > I am wrong good, but I think this is an important piece for how to handle >> the >> > next main issue. >> >> How about referencing Section 3 of >> https://tools.ietf.org/html/draft-ietf-trill-mtu-negotiation-05 >> which is currently in IETF Last Call? (The wording of that section is >> probably going to be improved based on an OPS review by Brian >> Carpenter.) >> >> > UDP encapsulation and IP fragments. >> ---------------------------------- >> > I see it as a big issue that UDP encapsulation is the native one, and that >> > relies on IP fragmentation despite the need for reliable fragmentation. >> With >> > the setup of having to support 1470 MTU on TRILL level some packets will >> be >> > fragmented in many environments. That will lead to a lot of losses, and as >> > discussed below a very big problem with middleboxes. The main problem >> here is >> > that if one tries to rely on IP fragments one will have issues with packets >> > ending up in black holes. And different problems depending on IPv4 or >> IPv6. >> > IPv6 is lilkely the lesser problem assuming that one have working PMTUD. >> > >> > There are several ways out of this. >> > >> > 1. Detect issues and use TCP encapsulation with correctly set MSS to not >> get IP >> > fragements 2. Determine MTU and implement an fragmentation >> mechanism on top of >> > UDP. >> >> So, I don't see that much problem with UDP being the general default >> consistent with the TRILL philosophy of defaulting to need zero or >> minimal configuration. The default should be to use multicast Hellos >> for discovery of neighbors which sure points at UDP to me. Having to >> traverse a NAT should be a rare case. Since, in the NAT case, you have >> to configure things related to the static binding and the IP >> address(es) of peer(s) anyway you can also configure to use a >> different encapsulation than UDP, such as TCP, at the same time. I >> don't see it as much of a problem if, by default, TRILL won't operate >> through a NAT. If you are using UDP and it fragments and fragments are >> dropped at a NAT, probably you can't exchange Hellos so you will not >> form an adjacency and anything on the other side of the NAT will not >> be visible. >> >> > Zero Checksum: >> > -------------- >> > >> > Section 5.4: >> > >> > UDP Checksum - as specified in [RFC0768] >> > >> > Considering the fast path encapsulation desire, I am surprised to not see >> any >> > mentioning of use of zero checksum here. Raising the zero checksum and >> forward >> > reference would be good I think. >> > >> > And then Section 8.5: >> > >> > The requirements for the usage of the zero UDP Checksum in a UDP >> > tunnel protocol are detailed in [RFC6936]. These requirements apply >> > to the UDP based TRILL over IP encapsulations specified herein >> > (native and VXLAN), which are applications of UDP tunnel. >> > >> > If you actually intended to allow zero checksum, then you actually should >> > document that Trill fulfills the requirements that the applicability statement >> > raises. I have not analyzed how well it meets these requirements. >> > >> > Please review Section 6.2 of RFC 8086 for example how that can be done. >> >> OK. We'll look into it. >> >> > TCP Encapsulation issue >> > ----------------------- >> > >> > Section 5.6: >> > >> > The TCP encapsulation appear to be missing an delimiter format allowing >> each >> > individual TRILL packet/payload to be read out of the TCP's byte stream. In >> > other words, a normal implementation has no way of ensuring that the TCP >> > payload starts with the start of a new TRILL payload. Multiple small TRILL >> > payloads may be included in the same TCP payload, and also only parts as >> TCP is >> > one way of dealing with TRILL packets that are larger than the >> IP+Encapsulation >> > MTU that actually will work. >> > >> > This comment is based on that there appear to be no length fields included >> in >> > the TRILL header. The most straight forward delimiter is a 2-byte length >> field >> > for the TRILL payload to be encapsulated. >> >> Right. It might also be useful to include some sort of check field, as >> is done in BGP, to detect if you are out of sync in parsing the TCP >> stream. >> >> Another point is that, while with UDP it seems fine to send packets >> with assorted QoS, you don't want to encourage re-ordering of TCP >> packets in a stream. So if TCP encapsulation is being used, you want >> to use the same DSCP value for the packets in a particular TCP stream. >> So, generally, you need to have a TCP connection per priority handling >> category. Mapping the 8 priority levels into a smaller number of >> handling categories is a normal thing to do so you certainly don't >> necessarily need 8 TCP connections. Adding material on this should not >> be too hard. >> >> > Section 5.6: >> > >> > TCP endpoint requirements. I do wonder if an application like TRILL actual >> > would need to discuss performance impacting implementation choices or >> > limitations. For example use of NAGLE, the requirements on buffer sizes in >> > relation to Bandwidth delay products, as buffer memory in a RBridge will >> impact >> > performance. >> >> Well, I'm not sure how deeply this document should get into such >> performance issues. What about just saying something about >> consideration being given to tuning TCP for performance and pointing >> to one or a few other RFCs that talk about this? >> >> > Congestion Control >> > ------------------ >> > First thanks for the effort here. >> >> You're welcome. >> >> > 8.1.2 In Other Environments >> > >> > Where UDP based encapsulation headers are used in TRILL over IP in >> > environments other than those discussed in Section 8.1.1, specific >> > congestion control mechanisms are commonly needed. However, if the >> > traffic being carried by the TRILL over IP link is already congestion >> > controlled and the size and volatility of the TRILL IS-IS link state >> > database is limited, then specific congestion control may not be >> > needed. See [RFC8085] Section 3.1.11 for further guidance. >> > >> > This is correct, however my question is if the RBridges have any way of >> knowing >> > which traffic is actually congestion controlled, considering that TRILL >> provides >> > an layer 2 abstraction. I wonder if there should be any type of white list of >> > the types of layer 2 payloads that can be assumed to be congestion >> controlled, >> > and thus okay to forward over IP paths? I am worried that without any >> > recommendation to prevent traffic that is not controlled to be forwarded, >> can >> > lead to congestion issues. >> > >> > The other issue I think may exist is the issue serial unicast emulation of >> > broadcast/multicast creates. As this amplifies the outgoing packet rate with >> > a factor of how many addresses are configured for serial unicast this can >> > be significant traffic expansion. Thus, I think additional considerations are >> > needed here, and maybe rate limiting of the amount of traffic to be >> multicasted. >> >> OK. We can think about those issues. >> >> > Flow and ECMP >> > ------------- >> > >> > Section 8.3: >> > >> > For example, for TRILL >> > Data, this entropy field could be based on some hash of the >> > Inner.MacDA, Inner.MacSA, and Inner.VLAN or Inner.FGL. >> > >> > I would appreciate clearer references to what these fields are. >> >> In a TRILL Data packet, the payload after the TRILL Header looks like >> an Ethernet frame except that there is always either a VLAN tag or, >> alternatively, where the VLAN tag would be, a Fine Grained Label >> [RFC7172]. (The preceding is the view in the TRILL RFCs, but there is >> an equivalent and equally valid view in which all the fields through >> and including the VLAN or FGL tag are part of the TRILL Header.) The >> TRILL base protocol specification focuses on Ethernet as a link >> technology between TRILL switches, in which case there will be a link >> header including an Outer.MacDA and Outer.MacSA fields and possibly an >> Outer.VLAN, all before the TRILL Header. See Figure 1 and Figure 2 in >> RFC 7172. >> >> Some of the above could be added to the draft for clarity. >> >> > If I understand this correctly, the idea here is to look into the inner >> > layer 2 frames, and use the flow equivalents that exists on that level and >> > hash that into value that maps the flows onto the source port range. >> >> Yes. >> >> > I think this text should include a summary of the principle and ensure to >> > note the important requirement that what is considered flows in the inner >> > must not result in being striped over multiple source ports as this may lead >> to >> > reordering issues due to packets taking different paths. >> >> Well, we can add some text. But when would the relative ordering >> matter for two TRILL Data packets where the two inner native payloads >> have different values for any one or more of these three fields >> (Inner.MacDA, Inner.MacSA, and inner VLAN/FGL tag) ? If any of those >> fields are different, you are talking about different streams. >> >> > NAT and TRILL over IP: >> > Section 8.5: >> > >> > If one like to use TRILL over IP through a NAT, then there are some very >> > important considerations that are missing. First the need for static binding >> > configurations or the need for determining ones external address(es) and >> be >> > able to communicate that to the peer RBridges, and in addition ensure that >> one >> > has keep-alives to that the NAT binding never times out. >> >> I think those are good points. There is an additional problem that >> TRILL Hellos detect neighbors with which they have 2-way connectivity >> by indicating, inside the Hellos that are sent, from what neighbors >> Hellos have been received on that port. If a NAT is involved, these >> neighbor addresses inside Hellos need to be mapped. >> >> > Next is the issue that there is almost zero chance of getting a IP/UDP >> > encapsulation TRILL payload through the NAT if it results in IP >> fragmentation, >> > as NATs don't do defragment and refragmented on the internal side, and >> an IP >> > fragment lacks UDP port and thus can't be matched to binding. >> >> So perhaps the recommendation should be to configure the port to use >> TCP if there will be fragmentation. >> >> > Also if you like to run IP/ESP through a NAT, then you most likely need the >> > IP/UDP/ESP encapsulation (https://tools.ietf.org/html/rfc3948). Note that >> this >> > will restrict the MTU even further and thus ensure that the 1470 >> requirement >> > cannot be fulfilled even without additional tunnels over an 1500 bytes MTU >> > Ethernet infrastructure. >> > >> > I would note that also firewalls likely have issues with IP fragments for the >> > same reason, they require significant amount of state to be verified if they >> > should be let through. >> > >> > In general I think you should create a configuration that has chance to work >> > through most middleboxes, but I think you should require static bindings. I >> > think that configuration is, and don't laugh now, but >> IP/UDP/ESP/TCP/TRILL, >> > otherwise you will not be able to have both security and reliable >> fragmentation >> > of TRILL packets. >> >> OK. Thanks again for this review. It has pointed out a number of >> problems and in thinking about those, I believe a couple of further >> problems have come to mind that I mentioned above. We'll work on a >> revised draft. >> >> Thanks, >> Donald >> =============================== >> Donald E. Eastlake 3rd +1-508-333-2270 (cell) >> 155 Beaver Street, Milford, MA 01757 USA >> d3e3e3@xxxxxxxxx >> >> > Cheers >> > >> > Magnus Westerlund >> >> _______________________________________________ >> Tsv-art mailing list >> Tsv-art@xxxxxxxx >> https://www.ietf.org/mailman/listinfo/tsv-art