Bob,
Thank you very much for reviewing the draft and provided
in-depth comments. I am very sorry for the delayed response
due to traveling.
Replies to your comments are inserted below marked by
[Linda]:
Reviewer: Bob Briscoe
Review result: Not Ready
I have been selected as the Transport Directorate
reviewer for this draft. The Transport Directorate seeks to
review all transport or transport-related drafts as they
pass through IETF last call and IESG review, and sometimes
on special request. The purpose
of the review is to provide assistance to the Transport ADs.
For more information about the Transport Directorate Reviews
and the Transport Area Review Team, please see
https://trac.ietf.org/trac/tsv/wiki/TSV-Directorate-Reviews
In this case, very very few of the review comments relate
to transport issues, although the greatest issue concerns a
desire that the network could pause or stop connections
during L3 VM Mobility, which is certainly a transport issue.
[Linda] There is “Hot Migration” with transport service
continuing, and there is a “Cold Migration”, which is a
common practice in many data centers, which stop the task
running on the old place and move to the new place before
restart as described in
the Task Migration.
Is it helpful to add this description to the draft?
==Summary==
The technical aspects of the draft concerning L2 VM
mobility (within a subnet) seem sound. However, this is only
part of the draft, which has the following
issues:
#. The introduction does not say what the purpose of
publishing this draft is.
It seems that, rather than describing a specific protocol
or protocols, it intends to describe the overall system
procedure that would typically be used in DCs for VM
mobility. It is tagged as a BCP, but it does not say who
needs this BCP, why it is useful
for the IETF to publish this BCP, how wide the authors'
knowledge is of current practice (given DCs are private), or
why this is a BCP rather than a protocol spec.
[Linda] The first paragraph on Page 3 has the description
why VM Mobility is needed. Is it helpful to move this
paragraph to the beginning of the Introduction Section?
“Virtualization
which is being used in almost all of today’s data
centers
enables many virtual machines to run on a single
physical
computer
or compute server. Virtual machines (VM) need hypervisor
running
on the physical compute server to provide them shared
processor/memory/storage.
Network connectivity is provided by the
network
virtualization edge (NVE) [RFC8014]. Being able to move
VMs
dynamically,
or live migration, from one server to another allows for
dynamic
load balancing or work distribution and thus it is a
highly
desirable
feature [RFC7364].”
The draft starts out (S.3) as if it intends to say what a
good VM Mobility protocol should or shouldn't do, but the
rest of the document doesn't give any reasoning for these
recommendations, it just asserts what appears to be one view
of how a whole VM
Mobility system works, sometimes referring to one example
protocol RFC for a component part, but more often with no
references or details.
[Linda] Is it helpful to move the paragraph above to the
beginning of the Introduction Section? So that audience is
aware of why VM Mobility is needed. And then follow up with
what a good VM Mobility protocol should or shouldn't do?
#. It does not seem as if the NVO WG has discussed the
purpose of using normative text in this draft. See detailed
comments.
[Linda] The “Intended status” of the draft is “Best
Current Practice”. So all the text are not “normative”. Is
it Okay?
#. The draft silently slips back and forth between VM
mobility and VM redundancy, without recognizing the
differences. See detailed comments.
[Linda] There is only one usage of “redundancy” in the
entire document, used under the context of “Hot standby
option”, indicating the “redundancy” of “the VMs in both
primary and secondary domains have identical information and
can provide services
simultaneously as in load-share mode of operation” being
expensive.
#. Please adopt different terminology than "source NVE"
and "destination NVE", which are really poor choices of
terms for an intermediate node. See detailed comments. Why
not use "old NVE" and "new NVE", which is what you mean?
[Linda] Thanks for the suggestion. We will change to “Old
NVE”, and “new NVE”.
#. Applicability is fairly clearly outlined, but it is
not clear whether hosts corresponding with the mobile VMs
are part of the same controlled environment or on the
uncontrolled public Internet. See detailed comments.
[Linda] “Hosts” are the App running on the VM. It is the
under the same controlled environment. Not on uncontrolled
public internet.
#. Section 4.2.1 on L3 VM mobility reads like some
potential half-thought-through ideas on how to solve L3
mobility, rather than current practice, let alone best
current practice. Either current practice should be
described instead, or the scope of the
draft should be narrowed solely to L2 VM mobility. See
detailed comments.
[Linda] This is refereeing to “Cold Migration”, which is
a common practice in many data centers.
# The VM's file system is described as state that moves
with the VM (S.6), but VM mobility solutions often move the
VM but stitch it back to its (unmoved) storage. Conversely,
the storage can also move independent of the VM.
[Linda] It depends. When a VM move to a different zone,
the storage/file can becomes inaccessible.
#. The draft omits some of the security, transport and
management aspects of VM mobility. See detailed comments.
[Linda] Can you provide some text?
#. The draft reads as if different sections have been
written by different authors and no-one has edited the whole
to give it a coherent structure, or to ensure consistency
(both technical and editorial) between the parts. See
detailed comments.
[Linda] we can improve.
#. The quality of the English grammar does not allow a
reviewer to concentrate on the technical aspects rather than
the English. It would have been useful if one of the
English-speaking co-authors had improved the English before
submission for review.
See detailed comments.
[Linda] can you help? Becoming a co-author to improve?
==Detailed Comments==
===#. Normative statements===
In the body of the document, there is just one occurrence
of normative text (actually two "MUST"s, but both state a
common requirement - just written separately for IPv4 and
IPv6). This merely serves to imply that everything else the
document says is less
important or optional, which was probably not the intention.
[Linda] The goal is to indicate any solution in moving
the VM “MUST” follow this rule. They make sense, aren’t
they?
At the start there is a requirements section, which
states what a VM Mobility protocol "SHOULD" or "SHOULD NOT"
do. I think this is intended as a set of goals for the rest
of the document. If so, these "SHOULDs" are not intended to
apply to implementations,
so they ought not to be capitalized.
[Linda] okay, will change.
The first requirement, "Data center network SHOULD
support virtual machine mobility in IPv6", is written as a
requirement on all DC networks, not on implementations. I
assume this was intended to read as "Data center network
virtual machine mobility protocols
SHOULD support IPv6". Even then, it doesn't really add
anything to say VM mobility should support v6 and it should
support v4. A L2 solution won't. While undoubtedly, a L3
solution will at least support one of them.
[Linda]Agree. Will change it to “Data center that support
IPv6 address should …”
I'm not sure that 'protocol' is the right word anyway; I
think 'VM Mobility procedure' would be a better phrase,
because it includes steps such as suspending the VM, which
is more than a protocol.
[Linda] yes. Will change to “Procedure”.
The requirement "Virtual machine mobility protocol MAY
support host routes to accomplish virtualization", is not
followed up at all in the rest of the draft.
Even if this requirement stays, the last 3 words should
be deleted.
[Linda] will change to “Host Route can be used to
support the Virtual Machine Mobility Procedure.”
By the end of the draft, the solution falls far short of
the most relevant "Requirements" anyway, so one assumes the
title of the section ought to have been "Goals".
Specifically, even in the simpler case of L2 VM mobility,
S.4.1 says that triangular routing
and tunnelling persist "until a neighbour cache entry times
out". A cache timeout is about 10 orders of magnitude longer
than the requirement to only persist "while handling packets
in flight", which would be a few milliseconds at most (the
time for packets
to clear the network that were already launched into flight
when the old VM stopped).
Whatever, it would be preferable for the draft to give
rationale for these requirements, rather than just assert
them. This would help to shed light on the merits of the
different trade offs that solutions choose.
[Linda] Agree, will add.
===#. Mobility vs. Redundancy===
Redundancy and mobility have a lot of similarities, but
they have different goals. With mobility, it is necessary to
know the exact instant when one set of state is identical to
the other so it can hand over. With redundancy, the aim is
to keep two (or
more) sets of state evolving through the same sequence of
changes, but there is no need to know the point at which one
is the same as the other was at a certain point.
[Linda] Agree with what you said. There is only one usage
of “redundancy” in the entire document, used under the
context of “Hot standby option”, indicating the
“redundancy” of “the VMs in both primary and secondary
domains have identical information
and can provide services simultaneously as in load-share
mode of operation” being expensive.
The draft slips from mobility to resilience in the
following places:
* S.2. Terminology: Warm VM Mobility is defined without
any ending, as if it is permanent replication. * S.7.
"Handling of Hot, Warm and Cold Virtual Machine Mobility" is
actually all about redundancy, and doesn't address mobility
explicitly.
[Linda] Will add the definition “Hot Migration”, “cold
migration”, and “warm migration”.
===#. Terminology===
Packets run from the source at A to the destination at B
via NVE1, then via NVE2. Please don't call NVE1 and NVE2 the
source NVE and the destination NVE.
In future, no-one will thank you for the apparent
contradictions when they continually stumble over phrases
like this one in S.4.1: "...send their packets to the source
NVE".
The term "packets in flight" is used incorrectly to refer
to all the packets sent to the old NVE after the VM has
moved, even if they were launched into flight long after the
old VM stopped receiving packets.
[Linda] thank for the comments. Will change.
BTW, I think s/before/after/ in: "that have old ARP or
neighbor cache entry before VM or task migration".
I think: s/IP-based VM mobility/L3 VM mobility/
throughout, because "based"
sounds (to me) like the mobility control protocol is over
(i.e. based on) IP.
===#. Applicability===
In section 4.2 it says that the protocol mostly used as
the IP based task migration protocol is ILA. This implies
that all hosts corresponding with the mobile VMs are either
part of the same controlled environment, or they are proxied
via nodes that are
part of the same controlled environment (I only have passing
knowledge of ILA, but I understand that it depends on ILA
routers on the path). If I am correct, this aspect of scope
needs to be made clear from the start.
Also under the heading of applicabiliy, the sentence
"Since migrations should be relatively rare events" appears
very late in the document (S.4.2.1). The assumed level of
churn ought to be stated nearer the start.
[Linda] yes, under the same controlled environment.
===#. L3 Mobility===
L2 VM mobility is independent of the application, because
resolution of L2 mappings is delegated to the stack. In
contrast, L3 VM mobility is only feasible under certain
conditions, because an application needs an IP address to
open a socket (resolution
of DNS names is not delegated to the stack, and apps can use
IP addresses directly anyway).
Examples of the 'certain conditions':
a) /All/ applications used in the whole DC load balancing
scheme contain IP address migration logic for /all/ their
connections; b) VMs running solely applications that support
IP address migration register this fact with the NVA, and it
only select such
VMs for mobility. c) An abstraction is layered over /all/
the IP addresses exposed to applications (at both ends) so
that the IP addresses that applications use are solely
identifiers (e.g. ILA, LISP, HIP), not also locators.
The introduction says the draft is about VM mobility in a
multi-tenant DC, so the DC admin will not know the range of
applications being used. This excludes condition (a) above.
When the draft says "...if all applications running are
known to handle this
gracefully...", it doesn't quantify just how restrictive
this condition is, and it gives no explanation of how this
knowledge might be 'known' or which function within the
system 'knows' it.
S.4.2.1 contains what seems like plenty of arm-waving.
* "TCP connections could be automatically closed in the
network stack during a migration event."
o There is no TCP connection state in the network
stack.
o Even if the network starts to drop every
packet, the TCP connection
state persists in the end-points for a duration
of the order of 30-90
minutes (OS-dependent) before TCP deems the
connection is broken. o
Other transport protocols have similar designs
(including the app-layer
of protocols over UDP).
* "More involved approach to connection migration":
o pausing the connection [does this refer to an
actual feature of any
L4 protocol?] o packaging connection state and
sending to target [does
this assume logic written into the application,
or is this assuming the
stack handles this and the app is restricted to
using some form of
separate identifier/locator addresses?] o
instantiating connection
state in the peer stack [ditto?].
There's some arm-waving in S.7 too:
"Cold Virtual Machine mobility is facilitated by the VM
initially
sending an ARP or Neighbor Discovery message at the
destination NVE
but the source NVE not receiving any packets
inflight."
[How is it arranged for the source NVE not to receive
any packets in flight?]
And in S.7:
"In hot
standby option, regarding TCP connections, one option
is to start
with and maintain TCP connections to two different VMs
at the same
time."
[This sounds like resilience logic has been written
into the application,
which would be a special case but not something VM
mobility infrastructure
could depend on.]
[Linda] will add.
===#. Gaps===
#. Security Considerations: repeats issues in other
drafts that are not specific to mobility, but it does not
mention any security issues specifically due to VM mobility.
It says that address spoofing may arise in a DC (sort-of
implying it is worse than
in non-DC environments, but not saying why). The handshake
at the start of a connection (e.g. TCP, SCTP, QUIC) checks
for source address spoofing. So L3 VM mobility would be more
vulnerable to source address spoofing in cases where the
mobile VM was the connection
initiator and there was not a new handshake after the move.
However, this draft does not contain any detailed mobility
protocols, so it is not possible to identify any specific
security flaws.
#. Transport Issues: Effect of delay on the transport:
Cold mobility introduces significant delay, and other forms
less, but still some delay. It should be pointed out that
some applications (e.g. real-time) will therefore not be
useful if subjected to
VM mobility. Similarly, even a short period of delay will
drive most congestion controls to severely reduce
throughput. These points might be self-evident, but perhaps
they should be stated explicitly.
BTW, in the L3 VM mobility case, the draft often refers
to TCP connections, but the address bindings of any
transport protocols would have to be migrated due to VM
mobility (e.g. SCTP; sequences of datagrams over UDP;
streams over UDP such as with RTP,
QUIC).
#. Management Issues: perhaps the draft ought to
recommend statistics gathering (e.g. time taken, amount of
duplicate data) to aid a DC's future decisions on the
cost-benefit of moving a VM. The OPSDIR review says a BCP
does not /have/ to describe management
issues, but this document seems to describe a whole system
procedure, not just a protocol, which then surely includes
the management plane.
[Linda] can you become a co-author and add those in?
===#. Incoherent Structure===
S.4.1. happens to talk about VMs moving, while S.4.2.
happens to talk about tasks moving, but this is not the
distinguishing aspect of these two sections (anyway, S.2.
says "the draft uses task and VM interchangeably"): * "4.1
VM Migration" is about "L2
VM Mobility" so this ought to be the section heading, *
"4.2 Task Migration" is about "L3 VM Mobility" so this
ought to be the section heading. It would also help not to
switch from VM to task across these sections
- it's just a distraction.
S.4.1 needs better signposting of where each sub-case
ends (Subsections might be useful to solve this): * IPv4 *
end-user client * 2 paras starting "All NVEs communicating
with this virtual machine..." [Not clear that the end-user
case has ended and we
have returned to the general IPv4 case?] * IPv6 [Strictly,
it still hasn't said whether the end-user client case has
ended.] [Also, it doesn't explain why there is no need for
an end-user client case under IPv6?] Sections 5 & 6 seem
to be about either L2 or
L3 mobility, whereas Sections 7 &
8 seem to be restricted to L2.
The draft vacillates over what to do with packets
arriving at the old NVE in the L3 case (see also L3 mobility
above): * S4.2 first says packets are dropped, possibly with
an ICMP error message;
o then later it says they are silently dropped;
o then in the very next sentence it says either
silently drop them or forward
them to the new location
* S.5 says they should not be lost, but instead delivered
to the destination hypervisor
o then it describes how they are tunnelled (which is
not the same as
"forwarding").
The order in which all the stages of mobilty are given is
jumbled up across sections that also appear in arbitrary
order: * S.5 prepares, establishes uses then stops a tunnel,
but it doesn't say where the other stages fit between these
steps
o When tunneling packets, it talks about the
*migrating* VM not the
*migrated* VM, which implies tunnelling has
started before the new VM
is running. Does this imply there is a huge
buffer? o It says "Stop
Tunneling Packets - When source NVE stops
receiving packets destined
to..." but it is never clear when a source has
stopped sending packets
to a destination, unless it explicitly closes the
connection (e.g. with
a FIN in the case of TCP). Often there are long
gaps between packets,
because many flows are 'thin' (meaning the
application frequently has
nothing to send). These gaps can last for
milliseconds, hours or even
days without any implication that the connection
has ended.
* Then S.6. describes moving state, but doesn't say that
this is not after the previous tunnelling steps (or where it
fits within those steps). * Then S.7 describes hot, warm and
cold mobility, but doesn't lay out the tunnelling or steps
to move state
in each case. * Then S.8 says it's about VM life-cycle, but
just gives the very first 3 steps for allocation of
resources to a VM, then abruptly ends, without even starting
the VM, let alone getting to move it.
S.5 exhibits another inconsistency by talking about the
hypervisor, not the NVE.
==#. Nits==
Nits with the English are too numerous to mention them
all. Below are pointers to general problems as well as some
individual instances.
S.4
"Layer 2 and Layer 3 protocols are described next. In
the following
sections, we examine more advanced features."
s/following/subsequent/
S.4.1
Expand WSC, MSC and NVA on first use.
s/the VM moves in the same link/the VM moves in the same
subnet/
"i.e. end-user clients ask for the same MAC address upon
migration. [...] to ensure that the same IPv4 address is
assigned to the VM." I think s/IPv4/MAC/ was intended?
" All NVEs communicating with this virtual machine uses
the old ARP
entry. If any VM in those NVEs need to talk to the
new VM in the
destination NVE, it uses the old ARP entry."
Repetition: these 2 sentences say the same. (The mistake
is also repeated when these 2 sentences are repeated for
IPv6).
S.4.2.1
s/Push the new mapping to hosts./Push the new mapping to
communicating hosts./
S.5.
The IPv4/IPv6 pairs of paras for "tunnel estabilshment"
and "tunneling packets"
only differ in the words "IPv4"/"IPv6". So in each case a
single para could be given for IP (irrespective of whether
v4 or v6).
Thank you very much.
Linda Dunbar