Re: [Last-Call] Opsdir last call review of draft-ietf-alto-path-vector-17

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tim,

Thanks for the review and suggestion. We agree that more concrete use cases and 
examples will be helpful and some parts of the document need to be better 
clarified. We will revise the document accordingly. Please see inline for detailed 
comments.

Best,
Kai

> -----Original Messages-----
&gt; From: "Tim Chown via Datatracker" <noreply@xxxxxxxx>
&gt; Sent Time: 2021-09-09 19:49:56 (Thursday)
&gt; To: ops-dir@xxxxxxxx
&gt; Cc: alto@xxxxxxxx, draft-ietf-alto-path-vector.all@xxxxxxxx, last-call@xxxxxxxx
&gt; Subject: Opsdir last call review of draft-ietf-alto-path-vector-17
&gt; 
&gt; Reviewer: Tim Chown
&gt; Review result: Not Ready
&gt; 
&gt; Hi,
&gt; 
&gt; I have reviewed this document (draft-ietf-opsec-v6-26) as part of the
&gt; Operational directorate's ongoing effort to review all IETF documents being
&gt; processed by the IESG.  These comments were written with the intent of
&gt; improving the operational aspects of the IETF drafts. Comments that are not
&gt; addressed in last call may be included in AD reviews during the IESG review. 
&gt; Document editors and WG chairs should treat these comments just like any other
&gt; last call comments.
&gt; 
&gt; This draft proposes an extension to the ALTO protocol to allow the definition
&gt; of Abstract Network Elements (ANEs) on a path between two endpoints that can be
&gt; considered when orchestrating connectivity between those endpoints, rather than
&gt; just computing based on the abstract cost of a path.  A Path Vector allows a
&gt; set of such ANEs to be defined for a path.
&gt; 
&gt; Caveat:
&gt; 
&gt; I am generally familiar with the work of the ALTO group.  My work at Jisc, a
&gt; national research and education network, includes assisting universities and
&gt; research organisations optimise large scale data transfers (up to petabytes of
&gt; data).
&gt; 
&gt; Overall:
&gt; 
&gt; I believe the document is generally well written, and the problem space it is
&gt; addressing is one for which there is value in defining a solution, but I feel
&gt; the document suffers from being too abstract and vague about what it is
&gt; defining, and its consideration of practical use cases could be improved.  Thus
&gt; I feel at this stage it is Not Ready for publication.
&gt; 
&gt; General comments:
&gt; 
&gt; The use cases defined are quite varied - large scale analytics, mobile and
&gt; CDNs.  SENSE and LHC are not specifically data analytics use cases in the usual
&gt; sense of the word, rather SENSE is a model for orchestrating network links (and
&gt; capacity) between sites, and the LHC provides large scale data sets for four
&gt; major experiments that are distributed and computed upon via the WLCG
&gt; (worldwide large hadron collider computing grid).

KAI:
The document was first originated to support the data analytics use case, but
later was found to be useful in other scenarios. We will focus on the
analytics use case in the next revision.

&gt; 
&gt; For LHC, QoE is not so much about time to complete; the important point is not
&gt; to have data backlogging if performance drops.
&gt; 
&gt; For the WLCG, two networks have evolved over many years to carry the traffic
&gt; from the four main experiments; LHCOPN, the optical network, and LHCONE, the
&gt; overlay network, both of which are ‘manually’ configured, and with enough
&gt; capacity for the traffic thanks to regular network forward look exercises. 
&gt; While a little complex to administer, other emerging disciplines have expressed
&gt; interest in using LHCONE to move data, and some have established agreements
&gt; (e.g. SKA, I believe).  While a means to provision capacity on demand would be
&gt; attractive, the R&amp;E networks typically have capacity, LHCOPN/LHCONE carry the
&gt; LHC traffic, and bottlenecks are in the end sites (hence the evolution of the
&gt; Science DMZ principles).

KAI:
Thanks very much for the clarification. Indeed we intermingled LHC with other
data analytics systems, which typically use the coflow abstraction [1] and
optimize for job completion time. We will clarify in the next revision that
different analytics systems have different QoE objectives and illustrate how
the path vector extension can support these use case respectively.

&gt; 
&gt; Some specific examples of ANEs would be very helpful.  While the document does
&gt; contain examples, they are not grounded around a use case I can readily relate
&gt; to, such as the orchestration of a large data flow between two sites in
&gt; different R&amp;E networks.  Can the doc show some real examples?
&gt; 

KAI:
That is a very good suggestion. We will add more examples in the next
revision to better motivate the use of ANE.


&gt; Section 3 talks of definitions of ANEs being “similar to” Network Elements in
&gt; RFC2216, but this is vague.  The topology in Figure 5 is quite simple, as an
&gt; example; something more realistic would be interesting.

KAI:
We will add a more realistic example to motivate the definition of ANE and the
initial properties. As figure 5 is used to illustrate the examples of message
formats, we will move it to the example section.

&gt; Ultimately, if ALTO
&gt; clients have the full network topology even then they may not know about the
&gt; routing that occurs by default, so implicitly there's an assumption of a
&gt; capability to steer traffic to meet a request. 

KAI:
This is not entirely true. With path vector, the routing is already specified 
for a given source and destination pair. Thus, the client must not assume that
the ALTO server has the capability to modify the routing. In fact, for most 
cases, the network only exposes information about the path and does not provide
any control capability inside the network. For certain use cases the network may
provide  certain levels of control capability, for example, if a network allows
clients to reserve bandwidth for end-to-end communication, it may configure an 
ALTO server to provide the `max-reservable-bandwidth` property. Note this is not
an issue specific to the path vector document but to the ALTO framework: ALTO 
carries the information but how to use the information depends on a higher-layer
protocol. We will make this clear in the next revision.

&gt; What is the “request” referred to in 5.1.2, for example?

KAI:
The requests in 5.1.2 are referring to HTTP requests to ALTO services, mostly
requests to unified property services or requests to the same path vector resource.

&gt; 
&gt; It seems that the document argues that ‘bottlenecks’ are typically capacity
&gt; based; do ANEs include specific links, rather than routers, firewalls, etc?   A
&gt; stateful firewall can be a significant bottleneck on throughput, for example.

KAI:
ANE can include routers, firewalls and other middleboxes. However, an ALTO
server may not want and may not need to distinguish what the bottleneck really
is -- it is actually one reason why we use the term "abstract network
element". For example, the maximum throughput of a firewall can be considered
as the capacity of the ANE exposed to the ALTO clients. We will add the
firewall example to illustrate the use of ANE in the next revision.

&gt; 
&gt; In 4.2.1 it talks of ALTO client identifying bottlenecks; a little more
&gt; discussion and examples of that would be useful, for practical use cases such
&gt; as an international R&amp;E data transfer.

KAI:
We will add more discussions on identifying bottlenecks with path vector. Some
pointers are attached below.

&gt; The discussion on p.9 about multiple flows is a little odd; in practice in R&amp;E
&gt; networks large transfers use tools like GridFTP which uses multiple parallel
&gt; TCP flows, such that loss on individual flows does not severely impact
&gt; throughput.  Of course, BBR also reduces this concern.

KAI:
For GridFTP and BBR, the multiple flows are established between the same
source and destination but the example contains two "flows" of two source and
destination pairs. The "multiple flows" in the example, however, represent
data transfers between different source and destination pairs but of the same
task (as in the coflow setting [1]).

Handling multiple flows between the same source and destination pair is
certainly an important use case. However, it cannot be solved completely by
the path vector draft alone. There is an individual draft called "flow cost
service" [2] which can potentially providing information for this use case,
together with the path vector extension.


&gt; 
&gt; Is the use of ALTO designed for single domain, or can it span multiple domains?
&gt;  It seems the latter, given the definition of ANE domains, but for the latter
&gt; there is no specific model for the common definition of ANEs.
&gt; 

KAI:
The extension specified in this document is designed for a single administrative
domain. The term "ANE domain" might be misinterpreted: the domain here does not
refer to a network domain. Rather, it is inherited from the "entity domain" 
defined in Sec 3.2 in I-D.ietf-alto-unified-prop-new document [3], which is used more
in the mathematical sense of "domain": the set of valid objects of a specific type. 
In the unified property extension, an entity domain is defined by a specific ALTO
resource (called defining information resource).

&gt; Given the definition of ANEs and PVs, how is traffic then orchestrated or
&gt; optimised?  Some pointers here would be useful.  SENSE may be one example. 
&gt; &gt;From my own discussion with people involved with SENSE (and AutoGOLE which uses
&gt; it) there is as yet no use of ALTO (rather SENSE uses its own methods to
&gt; orchestrate based on intent-based descriptors), but it is something that may be
&gt; considered in the future.

KAI:
There are different ways to realize the traffic reservation: MPLS tunnels,
OpenFlow rules, or end-based traffic control (e.g., Linux tc command). For
specific orchestration mechanisms, please see below ([4]-[6]) for some pointers. We
will add these pointers to the use cases section.

&gt; 
&gt; What of non-ALTO traffic on the same links; is the approach to reserve x%
&gt; capacity of a link for ALTO orchestrated traffic (the SENSE approach, I
&gt; believe)?

KAI:
ALTO is mainly used to expose the capacity information to the client and how the 
resource reservation is actually achieved is not in the scope of the document.

&gt; 
&gt; Tim
&gt; 


[1] Chowdhury, M. and Stoica, I. 2012. Coflow: A Networking Abstraction for Cluster
Applications. Proceedings of the 11th ACM Workshop on Hot Topics in Networks
(New York, NY, USA, 2012), 31–36.

[2] https://tools.ietf.org/search/draft-gao-alto-fcs-06

[3] https://datatracker.ietf.org/doc/html/draft-ietf-alto-unified-props-new-18

[4] Viswanathan, R., Ananthanarayanan, G. and Akella, A. 2016. CLARINET:
WAN-Aware Optimization for Analytics Queries. 12th USENIX Symposium on Operating
Systems Design and Implementation (OSDI 16) (Savannah, GA, 2016), 435–450.

[5] Xiang, Q., Chen, S., Gao, K., Newman, H., Taylor, I., Zhang, J. and Yang,
Y.R. 2017. Unicorn: Unified resource orchestration for multi-domain,
geo-distributed data analytics. 2017 IEEE SmartWorld, Ubiquitous Intelligence
Computing, Advanced Trusted Computed, Scalable Computing Communications, Cloud
Big Data Computing, Internet of People and Smart City Innovation
(SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) (Aug. 2017), 1–6.

[6] Xiang, Q., Zhang, J.J., Wang, X.T., Liu, Y.J., Guok, C., Le, F., MacAuley, J.,
Newman, H. and Yang, Y.R. 2018. Fine-grained, Multi-domain Network Resource
Abstraction As a Fundamental Primitive to Enable High-performance, Collaborative
Data Sciences. Proceedings of the International Conference for High Performance
Computing, Networking, Storage, and Analysis (Piscataway, NJ, USA, 2018),
5:1-5:13.
</noreply@xxxxxxxx>
-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux