Hi Martin,
Thanks for the careful review. I've inserted a few comments in line
below, but we will take care of all your points in the next version.
Regards
Brian
On 13-Dec-21 22:36, Martin Dürst via Datatracker wrote:
Reviewer: Martin Dürst
Review result: Ready with Issues
I'm not an expert in autonomous systems, and don't have experience reviewing
guideline/operational drafts (as opposed to protocol/format drafts).
Overall, I'm not completely sure this document is needed. Some of it sounds
like very obvious advice (paraphrased: install stuff, then configure it, only
then let it run). Other stuff may be covered in much more depth in a tutorial
or book about fault-tolerant and real-time systems. A pointer to such
literature might help.
Our impression is that typical network ops people do need such advice.
You're correct that a lot of it is familiar to experts in control systems
with fault tolerance and real time requirements. (Personally I worked on
real time control systems before I got into networking.)
The rest of this review is based on the assumption that the document serves an
advisory purpose, and there are enough people supporting its publication.
This document is close to ready, but a few places would benefit from
updated/clarified wording or fixes.
Introduction:
"could be upgraded as an ASA" -> "could be upgraded to an ASA"
Section 3.3:
"This API is intended": Say which API. The last mention ("The GRASP API") is
quit a bit away.
The following two sentences overlap very much in content; maybe one of them is
unnecessary:
The format and size
of the value is not restricted by the protocol, except that it must
be possible to serialise it for transmission in CBOR [RFC8949],
subject only to GRASP's maximum message size as discussed in
Section 5.
The actual value field of an objective is limited by the GRASP
protocol definition to any data structure that can be expressed in
Concise Binary Object Representation (CBOR) [RFC8949].
Figure 1 (Section 6): The diagram starts and ends at "Undeployed". I wonder
whether "Uninstalled" may be more appropriate.
6.2:
"on the same autonomic node and resources it is controlling" -> "on the
same
autonomic node as the resources it is controlling"
6.2.2:
"like a domain a technology segment or" -> "like a domain*,* a technology
segment or"
6.2.3:
"ASA Instance Mandate" seems to be an established term, but I don't find a
definition or pointer to one.
6.3:
The note, if really needed, should come towards the end, or be removed.
It
reads like an admission that the draft isn't ready for publication.
There are two lists, but they have neither bullets nor numbers, which looks
very strange.
The first bulleted list has some wording problems: "meaning enabling those":
what does "those" refer to? "meaning setting them": what does "them" refer to?
"by updating": This should not come after the colon.
Section 6, last paragraph: "The ASA life-cycle is discussed in more detail in
"A Day in the Life of an Autonomic Function"
[I-D.peloso-anima-autonomic-function].": This comes way too late. And it's way
unclear why there is a need for two documents when the longest chapter of this
document already discusses the life cycle. (And the authors should agree on
whether they want to spell life cycle with or without a hyphen.)
We should probably rephrase to indicate that the draft was a source of ideas
rather than a current reference, unless Pierre plans to restart work on it.
Sections 7, 8, and 9 seem way too short for sections. At least sections
7 and 8
could be easily combined, because they are both about coordination. Section 9
should be absorbed in an appropriate location, it's even shorter than sections
7 and 8.
Section 7: "a method is needed of identifying" -> "a method is needed to
identify"
Section 9: "quite likely to be expressed" -> "quite likely expressed"
Section 10, sentence before numbered list should end with a period or colon.
Section 10, list item number 8: It should be defined clearer whether GRASP
objectives work according to a "must ignore" model for extensibility or
not.
This is not the right location to do this, but if a base spec does it, this
text should clearly reflect that, and if there is nothing in the base spec,
that's a serious problem there.
Each objective has its own base spec, so that's where extensibility should
be covered. There's a general mechanism (the GRASP M_INVALID message) whereby
a recipient can signal "I didn't understand". Along with the O_ACCEPT and
O_DECLINE options in a negotiation, that allows for all sorts of extensibility
policies.
Section 10, last paragraph: "state [1]", "state[2]", "state[3]": Preferably use
other labels to distinguish states; these here look too much like roferences.
Security considerations: "Such an ASA must also be designed to avoid
loopholes": Some concrete examples would help here.
Reference [Huebscher08]: "computing--degrees": Either use a wider hyphen or
just "computing-degrees".
Reference [RFC8368]: In my printout, the label and the contents of this
reference ended up on different pages. This is probably a general CSS/browser
printer problem, but should be checked.
Appendix B: ASA: "An agent implemented on an autonomic node that implements":
"implemented ... that implements" is confusing. "either in part (in the
case of
a distributed function) or whole": "either in part (in the case of a
distributed function) or as a whole"
Appendix C, Example Logic Flows: "This appendix describes generic logic
flows":
It should be made clear early on that these are not alternatives (as the plural
may suggest), but are a set of flows that work together.
Appendix C, second paragraph: "would" is used twice, but can easily be replaced
by "will".
Appendix C, "When running as an origin": Shouldn't this also say that the
origin gives out resources to the delegators?
Appendix C, "When running as a delegator": Where do the resources come from?
"quantities of the resource" vs. "it hands out resource": is "resource"
countable or not? It should be the same throughout the draft.
Appendix C, "in stable storage": What is "stable storage"? SSDs and HDs
can
easily be damaged, too. If resources can really get lost, there seems to be a
need to re-establish a resource distribution by some kind of protocol that
checks who had or claims to have had what, and find the gaps. Also, this
ignores the whole problem of atomic reads/writes. What happens if a delegator
goes down before it has had time to write the resources into stable storage?
Another problem: It's not the resources that are written to storage, it's just
information about them.
Of course these are real problems that an implementation must solve.
It wasn't our intention to cover everything here, just the usage of
the ANIMA infrastructure. We should make that clearer in the text.
Appendix C: "a suitable timeout expires": This is very close to "a suitable
timeout times out", which sounds strange. What about "a suitable timeout is
reached" or "a suitable timeout is triggered"?
Appendix C: "consumers routers" -> "consumers' routers" or what?
<CODE BEGINS>: Is this suitable for pseudocode that can't be executed?
I'm not sure. The advantage of <CODE BEGINS> is that it changes the
derivative works status, according to the IETF Trust legal provisions.
In the "do forever" loop in the MAIN PROGRAM, the assignment "good_peer
= peer"
does not seem to make sense, because good_peer is set to none at the start of
the forever loop. Does the initialization "good_peer = none" need to go before
the start of the loop?
I'm going to go back to my Python implementation of RFC8992 to check that...
Brilliant catch! That's a transcription error when I transcribed Python
into pseudocode. The Python says:
good_peer = None # where we remember a helpful peer ASA
next_peer = 0 # for cycling through peers
while True:
...
In MAIN_NEGOTIATOR thread, the line
"i.e., wait for M_REQ_NEG"
has two problems: 1) it should be clearer that this is a comment, and 2) it's
unclear what M_REQ_NEG refers to.
That assumes detailed knowledge of the GRASP protocol. Probably simpler
if we delete both mentions of M_REQ_NEG.
In DELEGATOR thread:
The two lines in the "if OK" case have to be executed as an atomic unit
if one
wants to avoid resources getting lost or being doubly used.
True, but there is a single instance of the thread. So it's atomic
unless the thread crashes right then, in which case the resource is
gone from the pool and lost. I'm not sure computer science has a
solution to that problem.
In FLOODER thread:
How long is the sleep() or periodic timer supposed to be?
That's really app-dependent. We'll see if we can express that better.
--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call