Hi Tim, I'm really sorry about the wrong spelling. My apologies. Best, Kai > -----Original Messages----- > From: kaigao@xxxxxxxxxx > Sent Time: 2021-10-27 11:40:01 (Wednesday) > To: "Tim Chown" <tim.chown@xxxxxxxxxx> > Cc: "draft-ietf-alto-path-vector.all@xxxxxxxx" <draft-ietf-alto-path-vector.all@xxxxxxxx>, "last-call@xxxxxxxx" <last-call@xxxxxxxx>, "ops-dir@xxxxxxxx" <ops-dir@xxxxxxxx>, "alto@xxxxxxxx" <alto@xxxxxxxx> > Subject: Re: [alto] Opsdir last call review of draft-ietf-alto-path-vector-17 > > Dear Time, > > We have just submitted a revision of the Path Vector draft (-19). Below > are the links to the latest revision and the diffs. Please see inline > the pointers to the proposed changes to the comments, in brief: > > 1. More detailed examples are given in Sec 4.2: > a. what is the role of ALTO server/client in the scenario; > b. what an ANE represents and what information can be provided; > c. how the ALTO client can use the information. > 2. Clarification texts for "domain" in Sec 6.2. > 3. Clarification texts is added in Sec 6.4.1 for the role of ALTO > when the property "max-reservable-bandwidth" is provided. > 4. Examples of the initial properties are explained in Sec 6.4.3. > > Draft: > https://www.ietf.org/archive/id/draft-ietf-alto-path-vector-19.html > > Diffs: > https://www.ietf.org/rfcdiff?url1=draft-ietf-alto-path-vector-17&url2=draft-ietf-alto-path-vector-19 > > With the revision, we hope the draft is now clearer and easier to follow. > Please feel free to let us know if there are further comments or suggestions. > Thanks! > > Best, > Kai > > > > -----Original Messages----- > > From: "Tim Chown" <tim.chown@xxxxxxxxxx> > > Sent Time: 2021-09-27 17:40:10 (Monday) > > To: "kaigao@xxxxxxxxxx" <kaigao@xxxxxxxxxx> > > Cc: "ops-dir@xxxxxxxx" <ops-dir@xxxxxxxx>, "alto@xxxxxxxx" <alto@xxxxxxxx>, "draft-ietf-alto-path-vector.all@xxxxxxxx" <draft-ietf-alto-path-vector.all@xxxxxxxx>, "last-call@xxxxxxxx" <last-call@xxxxxxxx> > > Subject: Re: Opsdir last call review of draft-ietf-alto-path-vector-17 > > > > Hi, > > > > > On 24 Sep 2021, at 07:54, kaigao@xxxxxxxxxx wrote: > > > > > > Hi Tim, > > > > > > Thanks for the review and suggestion. We agree that more concrete use cases and > > > examples will be helpful and some parts of the document need to be better > > > clarified. We will revise the document accordingly. Please see inline for detailed > > > comments. > > > > Inline with TC> … > > > > > Best, > > > Kai > > > > > > > -----Original Messages----- > > > > From: "Tim Chown via Datatracker" <noreply@xxxxxxxx> > > > > Sent Time: 2021-09-09 19:49:56 (Thursday) > > > > To: ops-dir@xxxxxxxx > > > > Cc: alto@xxxxxxxx, draft-ietf-alto-path-vector.all@xxxxxxxx, last-call@xxxxxxxx > > > > Subject: Opsdir last call review of draft-ietf-alto-path-vector-17 > > > > > > > > Reviewer: Tim Chown > > > > Review result: Not Ready > > > > > > > > Hi, > > > > > > > > I have reviewed this document (draft-ietf-opsec-v6-26) as part of the > > > > Operational directorate's ongoing effort to review all IETF documents being > > > > processed by the IESG. These comments were written with the intent of > > > > improving the operational aspects of the IETF drafts. Comments that are not > > > > addressed in last call may be included in AD reviews during the IESG review. > > > > Document editors and WG chairs should treat these comments just like any other > > > > last call comments. > > > > > > > > This draft proposes an extension to the ALTO protocol to allow the definition > > > > of Abstract Network Elements (ANEs) on a path between two endpoints that can be > > > > considered when orchestrating connectivity between those endpoints, rather than > > > > just computing based on the abstract cost of a path. A Path Vector allows a > > > > set of such ANEs to be defined for a path. > > > > > > > > Caveat: > > > > > > > > I am generally familiar with the work of the ALTO group. My work at Jisc, a > > > > national research and education network, includes assisting universities and > > > > research organisations optimise large scale data transfers (up to petabytes of > > > > data). > > > > > > > > Overall: > > > > > > > > I believe the document is generally well written, and the problem space it is > > > > addressing is one for which there is value in defining a solution, but I feel > > > > the document suffers from being too abstract and vague about what it is > > > > defining, and its consideration of practical use cases could be improved. Thus > > > > I feel at this stage it is Not Ready for publication. > > > > > > > > General comments: > > > > > > > > The use cases defined are quite varied - large scale analytics, mobile and > > > > CDNs. SENSE and LHC are not specifically data analytics use cases in the usual > > > > sense of the word, rather SENSE is a model for orchestrating network links (and > > > > capacity) between sites, and the LHC provides large scale data sets for four > > > > major experiments that are distributed and computed upon via the WLCG > > > > (worldwide large hadron collider computing grid). > > > > > > KAI: > > > The document was first originated to support the data analytics use case, but > > > later was found to be useful in other scenarios. We will focus on the > > > analytics use case in the next revision. > > > > TC> OK, that’s fine. I know from speaking to people in groups such as at the GNA-G > > Data Intensive Science WG that alto principles are of interest, but it would take some > > significant effort to adopt them. So perhaps there’s a future Informational document > > To be written around that use case. > > > > KAI: Indeed. Some early studies that investigate the direction of using ALTO to provide > resource discovery in data science networks (UNICORN and ReSA) are included in the references. > Another related work is G2 by Reservior Lab and we are working with Reservior Lab to integrate > ALTO in their framework. A talk on the integration will be given at IETF 112. > > Regarding the use cases, we have included the following scenarios: 1) exposing network > bottlenecks with ALTO Path Vector and 2) exposing topology/resources of service edges. For > both scenarios, we draw images to show how ALTO is integrated and give examples of what > information can be provided. > > > > > > > > > For LHC, QoE is not so much about time to complete; the important point is not > > > > to have data backlogging if performance drops. > > > > > > > > For the WLCG, two networks have evolved over many years to carry the traffic > > > > from the four main experiments; LHCOPN, the optical network, and LHCONE, the > > > > overlay network, both of which are ‘manually’ configured, and with enough > > > > capacity for the traffic thanks to regular network forward look exercises. > > > > While a little complex to administer, other emerging disciplines have expressed > > > > interest in using LHCONE to move data, and some have established agreements > > > > (e.g. SKA, I believe). While a means to provision capacity on demand would be > > > > attractive, the R&E networks typically have capacity, LHCOPN/LHCONE carry the > > > > LHC traffic, and bottlenecks are in the end sites (hence the evolution of the > > > > Science DMZ principles). > > > > > > KAI: > > > Thanks very much for the clarification. Indeed we intermingled LHC with other > > > data analytics systems, which typically use the coflow abstraction [1] and > > > optimize for job completion time. We will clarify in the next revision that > > > different analytics systems have different QoE objectives and illustrate how > > > the path vector extension can support these use case respectively. > > > > TC> I think generally the LHCONE overlay is used more to support traffic engineering > > (Aad to some extent trust) at site ingress/egress borders, e.g. to differentiate the science > > traffic from the ‘day to day’ campus ‘business’ traffic. This reflects the Science DMZ > > principles later documented by ESnet. > > > > KAI: We have separated the data analytics case to 1) the network is controlled by a single > network manager as in the geo-distributed data center case or an SDN network [NOVA], and > 2) the network consists of multiple networks [Unicorn/ReSA]. We also add [G2] as a reference > to demonstrate how the information can be used by the ALTO client. > > > > > > > > > Some specific examples of ANEs would be very helpful. While the document does > > > > contain examples, they are not grounded around a use case I can readily relate > > > > to, such as the orchestration of a large data flow between two sites in > > > > different R&E networks. Can the doc show some real examples? > > > > > > > > > > KAI: > > > That is a very good suggestion. We will add more examples in the next > > > revision to better motivate the use of ANE. > > > > TC> Great, thank you. > > > > KAI: Please see Section 4.2 for the examples. > > > > > Section 3 talks of definitions of ANEs being “similar to” Network Elements in > > > > RFC2216, but this is vague. The topology in Figure 5 is quite simple, as an > > > > example; something more realistic would be interesting. > > > > > > KAI: > > > We will add a more realistic example to motivate the definition of ANE and the > > > initial properties. As figure 5 is used to illustrate the examples of message > > > formats, we will move it to the example section. > > > > TC> that will also be very useful, thank you. > > KAI: Please see Section 4.2 for the examples. > > > > > > > Ultimately, if ALTO > > > > clients have the full network topology even then they may not know about the > > > > routing that occurs by default, so implicitly there's an assumption of a > > > > capability to steer traffic to meet a request. > > > > > > KAI: > > > This is not entirely true. With path vector, the routing is already specified > > > for a given source and destination pair. Thus, the client must not assume that > > > the ALTO server has the capability to modify the routing. In fact, for most > > > cases, the network only exposes information about the path and does not provide > > > any control capability inside the network. For certain use cases the network may > > > provide certain levels of control capability, for example, if a network allows > > > clients to reserve bandwidth for end-to-end communication, it may configure an > > > ALTO server to provide the `max-reservable-bandwidth` property. Note this is not > > > an issue specific to the path vector document but to the ALTO framework: ALTO > > > carries the information but how to use the information depends on a higher-layer > > > protocol. We will make this clear in the next revision. > > > > TC> That’s a useful clarification, again thanks. > > > > KAI: Clarification texts are added in Section 6.4.1. We emphasize that ALTO is only > used for information exposure. > > > > > What is the “request” referred to in 5.1.2, for example? > > > > > > KAI: > > > The requests in 5.1.2 are referring to HTTP requests to ALTO services, mostly > > > requests to unified property services or requests to the same path vector resource. > > > > TC> OK. > > > > KAI: We change "requests" to "requests to other ALTO resources" in Sec 5.1.2. > > > > > > > > > It seems that the document argues that ‘bottlenecks’ are typically capacity > > > > based; do ANEs include specific links, rather than routers, firewalls, etc? A > > > > stateful firewall can be a significant bottleneck on throughput, for example. > > > > > > KAI: > > > ANE can include routers, firewalls and other middleboxes. However, an ALTO > > > server may not want and may not need to distinguish what the bottleneck really > > > is -- it is actually one reason why we use the term "abstract network > > > element". For example, the maximum throughput of a firewall can be considered > > > as the capacity of the ANE exposed to the ALTO clients. We will add the > > > firewall example to illustrate the use of ANE in the next revision. > > > > TC> I think the ‘problem’ is that by keeping the reference/naming “Abstract” it is > > harder to ground the text in a real use case, so examples would help. > > KAI: Examples of ANEs are both presented in Sec 4.2 (as part of specific use cases) and > in Sec 5.1 (as a standalone example). > > > > > In the Science DMZ case, campus firewalls (full stateful devices, with IDS) are often > > a significant bottleneck (for example I saw a case recently where a 20G path only > > achieved 8G for a science flow due to the IDS, even with it configured not to scan > > that traffic). > > > > > > > > > > In 4.2.1 it talks of ALTO client identifying bottlenecks; a little more > > > > discussion and examples of that would be useful, for practical use cases such > > > > as an international R&E data transfer. > > > > > > KAI: > > > We will add more discussions on identifying bottlenecks with path vector. Some > > > pointers are attached below. > > > > TC> OK. > > KAI: We add the pointers in Sec 4.2. > > > > > > > The discussion on p.9 about multiple flows is a little odd; in practice in R&E > > > > networks large transfers use tools like GridFTP which uses multiple parallel > > > > TCP flows, such that loss on individual flows does not severely impact > > > > throughput. Of course, BBR also reduces this concern. > > > > > > KAI: > > > For GridFTP and BBR, the multiple flows are established between the same > > > source and destination but the example contains two "flows" of two source and > > > destination pairs. The "multiple flows" in the example, however, represent > > > data transfers between different source and destination pairs but of the same > > > task (as in the coflow setting [1]). > > > > > > Handling multiple flows between the same source and destination pair is > > > certainly an important use case. However, it cannot be solved completely by > > > the path vector draft alone. There is an individual draft called "flow cost > > > service" [2] which can potentially providing information for this use case, > > > together with the path vector extension. > > > > TC> OK, thanks. In the LHC type of use case there are often flows between for > > example worker CPU nodes and remote data transfer nodes, so your example > > would fit that. But sometimes there are flows between logical DTNs at each site. > > > > > > > > > > > Is the use of ALTO designed for single domain, or can it span multiple domains? > > > > It seems the latter, given the definition of ANE domains, but for the latter > > > > there is no specific model for the common definition of ANEs. > > > > > > > > > > KAI: > > > The extension specified in this document is designed for a single administrative > > > domain. The term "ANE domain" might be misinterpreted: the domain here does not > > > refer to a network domain. Rather, it is inherited from the "entity domain" > > > defined in Sec 3.2 in I-D.ietf-alto-unified-prop-new document [3], which is used more > > > in the mathematical sense of "domain": the set of valid objects of a specific type. > > > In the unified property extension, an entity domain is defined by a specific ALTO > > > resource (called defining information resource). > > > > TC> OK, so that would be something that would be very useful to clarify, and probably mention > > early in the document. > > > > KAI: Clarification texts are added in Sec 6.2. > > > > > Given the definition of ANEs and PVs, how is traffic then orchestrated or > > > > optimised? Some pointers here would be useful. SENSE may be one example. > > > > >From my own discussion with people involved with SENSE (and AutoGOLE which uses > > > > it) there is as yet no use of ALTO (rather SENSE uses its own methods to > > > > orchestrate based on intent-based descriptors), but it is something that may be > > > > considered in the future. > > > > > > KAI: > > > There are different ways to realize the traffic reservation: MPLS tunnels, > > > OpenFlow rules, or end-based traffic control (e.g., Linux tc command). For > > > specific orchestration mechanisms, please see below ([4]-[6]) for some pointers. We > > > will add these pointers to the use cases section. > > > > TC> Thanks. > > > > > KAI: We have added the pointers in Sec 4.2 and in Sec 6.4.1. > > > > > > > > > What of non-ALTO traffic on the same links; is the approach to reserve x% > > > > capacity of a link for ALTO orchestrated traffic (the SENSE approach, I > > > > believe)? > > > > > > KAI: > > > ALTO is mainly used to expose the capacity information to the client and how the > > > resource reservation is actually achieved is not in the scope of the document. > > > > TC> OK, so again clarifying that is useful (to someone like me not following the > > work in great detail). > > > > KAI: Clarification texts are added in Sec 4.2.1 and in Sec 6.4.1. > > > Overall it’s a good draft, but I think the above extra examples and clarifications would > > be very welcome. > > > > Best wishes, > > Tim > > > > > > > > > > > > > Tim > > > > > > > > > > > > > [1] Chowdhury, M. and Stoica, I. 2012. Coflow: A Networking Abstraction for Cluster > > > Applications. Proceedings of the 11th ACM Workshop on Hot Topics in Networks > > > (New York, NY, USA, 2012), 31–36. > > > > > > [2] https://tools.ietf.org/search/draft-gao-alto-fcs-06 > > > > > > [3] https://datatracker.ietf.org/doc/html/draft-ietf-alto-unified-props-new-18 > > > > > > [4] Viswanathan, R., Ananthanarayanan, G. and Akella, A. 2016. CLARINET: > > > WAN-Aware Optimization for Analytics Queries. 12th USENIX Symposium on Operating > > > Systems Design and Implementation (OSDI 16) (Savannah, GA, 2016), 435–450. > > > > > > [5] Xiang, Q., Chen, S., Gao, K., Newman, H., Taylor, I., Zhang, J. and Yang, > > > Y.R. 2017. Unicorn: Unified resource orchestration for multi-domain, > > > geo-distributed data analytics. 2017 IEEE SmartWorld, Ubiquitous Intelligence > > > Computing, Advanced Trusted Computed, Scalable Computing Communications, Cloud > > > Big Data Computing, Internet of People and Smart City Innovation > > > (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) (Aug. 2017), 1–6. > > > > > > [6] Xiang, Q., Zhang, J.J., Wang, X.T., Liu, Y.J., Guok, C., Le, F., MacAuley, J., > > > Newman, H. and Yang, Y.R. 2018. Fine-grained, Multi-domain Network Resource > > > Abstraction As a Fundamental Primitive to Enable High-performance, Collaborative > > > Data Sciences. Proceedings of the International Conference for High Performance > > > Computing, Networking, Storage, and Analysis (Piscataway, NJ, USA, 2018), > > > 5:1-5:13. > > > </noreply@xxxxxxxx> > > > </last-call@xxxxxxxx></draft-ietf-alto-path-vector.all@xxxxxxxx></alto@xxxxxxxx></ops-dir@xxxxxxxx></kaigao@xxxxxxxxxx></tim.chown@xxxxxxxxxx> > _______________________________________________ > alto mailing list > alto@xxxxxxxx > https://www.ietf.org/mailman/listinfo/alto </alto@xxxxxxxx></ops-dir@xxxxxxxx></last-call@xxxxxxxx></draft-ietf-alto-path-vector.all@xxxxxxxx></tim.chown@xxxxxxxxxx> -- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call