Re: [Last-Call] Artart last call review of draft-ietf-anima-asa-guidelines-04

Martin J. Dürst <duerst@xxxxxxxxxxxxxxx> · Tue, 14 Dec 2021 15:59:56 +0900

Hello Brian,

On 2021-12-14 06:17, Brian E Carpenter wrote:
Hi Martin,

Thanks for the careful review. I've inserted a few comments in line
below, but we will take care of all your points in the next version.

Glad to be of help. Sorry most of it sounded pretty unpersonal.

A few comments inline.

Regards
    Brian

On 13-Dec-21 22:36, Martin Dürst via Datatracker wrote:
Reviewer: Martin Dürst
Review result: Ready with Issues

I'm not an expert in autonomous systems, and don't have experience 
reviewing
guideline/operational drafts (as opposed to protocol/format drafts).

Overall, I'm not completely sure this document is needed. Some of it 
sounds
like very obvious advice (paraphrased: install stuff, then configure 
it, only
then let it run). Other stuff may be covered in much more depth in a 
tutorial
or book about fault-tolerant and real-time systems. A pointer to such
literature might help.

Our impression is that typical network ops people do need such advice.
You're correct that a lot of it is familiar to experts in control systems
with fault tolerance and real time requirements. (Personally I worked on
real time control systems before I got into networking.)

I see, makes sense.

The rest of this review is based on the assumption that the document 
serves an
advisory purpose, and there are enough people supporting its publication.

This document is close to ready, but a few places would benefit from
updated/clarified wording or fixes.

Introduction:
"could be upgraded as an ASA" -> "could be upgraded to an ASA"

Section 3.3:
"This API is intended": Say which API. The last mention ("The GRASP 
API") is
quit a bit away.

The following two sentences overlap very much in content; maybe one of 
them is
unnecessary:
    The format and size
    of the value is not restricted by the protocol, except that it must
    be possible to serialise it for transmission in CBOR [RFC8949],
    subject only to GRASP's maximum message size as discussed in
    Section 5.

    The actual value field of an objective is limited by the GRASP
    protocol definition to any data structure that can be expressed in
    Concise Binary Object Representation (CBOR) [RFC8949].

Figure 1 (Section 6): The diagram starts and ends at "Undeployed". I 
wonder
whether "Uninstalled" may be more appropriate.

6.2:
"on the same autonomic node and resources it is controlling" -> "on the 
same
autonomic node as the resources it is controlling"

6.2.2:
"like a domain a technology segment or" -> "like a domain*,* a technology
segment or"

6.2.3:
"ASA Instance Mandate" seems to be an established term, but I don't 
find a
definition or pointer to one.

6.3:
The note, if really needed, should come towards the end, or be removed. 
It
reads like an admission that the draft isn't ready for publication.

There are two lists, but they have neither bullets nor numbers, which 
looks
very strange.

The first bulleted list has some wording problems: "meaning enabling 
those":
what does "those" refer to? "meaning setting them": what does "them" 
refer to?
"by updating": This should not come after the colon.

Section 6, last paragraph: "The ASA life-cycle is discussed in more 
detail in
"A Day in the Life of an Autonomic Function"
[I-D.peloso-anima-autonomic-function].": This comes way too late. And 
it's way
unclear why there is a need for two documents when the longest chapter 
of this
document already discusses the life cycle. (And the authors should 
agree on
whether they want to spell life cycle with or without a hyphen.)

We should probably rephrase to indicate that the draft was a source of 
ideas
rather than a current reference, unless Pierre plans to restart work on it.

Sections 7, 8, and 9 seem way too short for sections. At least sections 
7 and 8
could be easily combined, because they are both about coordination. 
Section 9
should be absorbed in an appropriate location, it's even shorter than 
sections
7 and 8.

Section 7: "a method is needed of identifying" -> "a method is needed to
identify"

Section 9: "quite likely to be expressed" -> "quite likely expressed"

Section 10, sentence before numbered list should end with a period or 
colon.

Section 10, list item number 8: It should be defined clearer whether 
GRASP
objectives work according to a "must ignore" model for extensibility or 
not.
This is not the right location to do this, but if a base spec does it, 
this
text should clearly reflect that, and if there is nothing in the base 
spec,
that's a serious problem there.

Each objective has its own base spec, so that's where extensibility should
be covered. There's a general mechanism (the GRASP M_INVALID message) 
whereby
a recipient can signal "I didn't understand". Along with the O_ACCEPT and
O_DECLINE options in a negotiation, that allows for all sorts of 
extensibility
policies.

In that case, write something like: "For each GRASP objective, make sure 
you follow the relevant extensibility requirements."

The way it's currently worded, an implementer may make things extensible 
where the GRASP objective spec prohibits that.

Section 10, last paragraph: "state [1]", "state[2]", "state[3]": 
Preferably use
other labels to distinguish states; these here look too much like 
roferences.

Security considerations: "Such an ASA must also be designed to avoid
loopholes": Some concrete examples would help here.

Reference [Huebscher08]: "computing--degrees": Either use a wider 
hyphen or
just "computing-degrees".

Reference [RFC8368]: In my printout, the label and the contents of this
reference ended up on different pages. This is probably a general 
CSS/browser
printer problem, but should be checked.

Appendix B: ASA: "An agent implemented on an autonomic node that 
implements":
"implemented ... that implements" is confusing. "either in part (in the 
case of
a distributed function) or whole": "either in part (in the case of a
distributed function) or as a whole"

Appendix C, Example Logic Flows: "This appendix describes generic logic 
flows":
It should be made clear early on that these are not alternatives (as 
the plural
may suggest), but are a set of flows that work together.

Appendix C, second paragraph: "would" is used twice, but can easily be 
replaced
by "will".

Appendix C, "When running as an origin": Shouldn't this also say that the
origin gives out resources to the delegators?

Appendix C, "When running as a delegator": Where do the resources come 
from?
"quantities of the resource" vs. "it hands out resource": is "resource"
countable or not? It should be the same throughout the draft.

Appendix C, "in stable storage": What is "stable storage"? SSDs and HDs 
can
easily be damaged, too. If resources can really get lost, there seems 
to be a
need to re-establish a resource distribution by some kind of protocol 
that
checks who had or claims to have had what, and find the gaps. Also, this
ignores the whole problem of atomic reads/writes. What happens if a 
delegator
goes down before it has had time to write the resources into stable 
storage?
Another problem: It's not the resources that are written to storage, 
it's just
information about them.

Of course these are real problems that an implementation must solve.
It wasn't our intention to cover everything here, just the usage of
the ANIMA infrastructure. We should make that clearer in the text.

Appendix C: "a suitable timeout expires": This is very close to "a 
suitable
timeout times out", which sounds strange. What about "a suitable 
timeout is
reached" or "a suitable timeout is triggered"?

Appendix C: "consumers routers" -> "consumers' routers" or what?

<CODE BEGINS>: Is this suitable for pseudocode that can't be executed?

I'm not sure. The advantage of <CODE BEGINS> is that it changes the
derivative works status, according to the IETF Trust legal provisions.

In the "do forever" loop in the MAIN PROGRAM, the assignment "good_peer 
= peer"
does not seem to make sense, because good_peer is set to none at the 
start of
the forever loop. Does the initialization "good_peer = none" need to 
go before
the start of the loop?

I'm going to go back to my Python implementation of RFC8992 to check 
that...

Brilliant catch! That's a transcription error when I transcribed Python
into pseudocode. The Python says:

good_peer = None # where we remember a helpful peer ASA
next_peer = 0    # for cycling through peers
while True:
     ...

In MAIN_NEGOTIATOR thread, the line
"i.e., wait for M_REQ_NEG"
has two problems: 1) it should be clearer that this is a comment, and 
2) it's
unclear what M_REQ_NEG refers to.

That assumes detailed knowledge of the GRASP protocol. Probably simpler
if we delete both mentions of M_REQ_NEG.

In DELEGATOR thread:
The two lines in the "if OK" case have to be executed as an atomic unit 
if one
wants to avoid resources getting lost or being doubly used.

True, but there is a single instance of the thread. So it's atomic
unless the thread crashes right then, in which case the resource is
gone from the pool and lost.

There's probably a single instance of this thread, but other threads may 
be running, too. And you don't know whether some implementation might 
not choose to use more than one thread.

I'm not sure computer science has a
solution to that problem.

I agree. But there are things to alleviate this, e.g. making sure that 
both operations are carried out together without other, unrelated 
threads or processes interfering, or even just making sure the details 
are done so that any superfluous stuff gets done before or after, but 
not in between. Also, it's probably a good think to say that there's no 
complete solution, because this can affect how operators handle things 
when there have been problems.

Regards,   Martin.

In FLOODER thread:
How long is the sleep() or periodic timer supposed to be?

That's really app-dependent. We'll see if we can express that better.

--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call