Re: [Last-Call] Genart last call review of draft-ietf-anima-asa-guidelines-04

Brian E Carpenter <brian.e.carpenter@xxxxxxxxx> · Tue, 7 Dec 2021 09:54:16 +1300

Hi Thomas,

Thanks for the careful reading and review. I think we can deal
with all your comments without difficulty. Just two possible
discussion points in line below.

Regards
   Brian

On 07-Dec-21 03:58, Thomas Fossati via Datatracker wrote:
Reviewer: Thomas Fossati
Review result: Ready with Issues

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at

<https://trac.ietf.org/trac/gen/wiki/GenArtfaq>.

Document: draft-ietf-anima-asa-guidelines-??
Reviewer: Thomas Fossati
Review Date: 2021-12-06
IETF LC End Date: 2021-12-13
IESG Telechat date: Not scheduled for a telechat

Summary:

The document contains guidance for building ASAs.  It discusses
different kinds of requirements and their impact on the software
architecture.  It looks like an useful doc to have.

In general, the document reads very well, with the exception of Section
6 - see "Minor issues" below.

Major issues:

Minor issues:

In Section 6.3, I have followed the reference to
draft-peloso-anima-autonomic-function and I noticed that the content of
Section 6 has been transplanted nearly as-is from there.  So, to avoid
redundancy, I wonder whether that content should be given the same
treatment as you do in Section 7 WRT draft-ciavaglia-anima-coordination?
Or maybe you want to re-think the approach and have Section 7 do similar
copy&paste from draft-ciavaglia-anima-coordination?  They are both
individual and expired draft after all so it's probably better doing the
latter.

I also wonder whether it is worth to spell out explicitly the fact that,
given ASAs may need to co-exist with the actual networking application,
they should be build to require minimal memory footprint &, in general,
use system resources with parsimony.  A related question is whether ASAs
require dedicated system resources in order to continue operating in a
busy system?

Generally we expect that ASAs will run at a much lower frequency than
any "production" workload in the node, so CPU load should not be a big
issue, but memory footprint in a constrained node is certainly a
concern. We tend to assume that ASAs will be mainly installed in
non-constrained devices, or that if they are in a constrained device,
they'll have a subset of functionality. Officially, we punted on that
issue - RFC8993 says "At a later stage, the ANIMA Working Group may
define a scope for constrained nodes with a reduced ANI and well-
defined minimal functionality."

Nits/editorial comments:

Section 2.

    *  Repeatedly flood an objective to the AN, so that any ASA can

Expand "AN" on first use.

    These threads should all either exit after their job is done, or
    enter a wait state for new work, to avoid blocking others
    unnecessarily.

"blocking others unnecessarily" is not what would typically happen,
maybe "to avoid wasting system resources" ?

    [...] It
    should also do whatever is required to avoid unnecessary resource
    consumption, such as including an arbitrary wait time in each cycle
    of the main loop.

I am not sure what "arbitrary wait time" refers to?  Is it a "sleep(n)"
at the end of each iteration of the main loop?  I think it's the
parsimony principle what you want to highlight here, and the first part
of the sentence is sufficient for capturing that without going into
concrete examples.

Section 3.3

    This API is intended to support the various interactions expected
    between most ASAs, such as the interactions outlined in Section 2.
    However, if ASAs require additional communication between themselves,
    they can do so using any desired protocol, even just a TLS session if
    that meets their needs.  One option is to use GRASP discovery and

What is the meaning of "just" in "just a TLS session"?  Also it's not
clear what kind of messages would flow through this additional channel
and if there are any requirements in terms of their security properties.

    [...] As
    noted above, the ACP can secure such communications, unless there is
    a good reason to do otherwise.

Maybe s/can/should/ and drop "unless ... otherwise"?

Section 6.1.1.

The typography used here to define inputs is a bit odd.  And in general
the whole section probably needs some more attention from an editorial
point of view.

Section 6.2

    the agent piece of code (when this does not start automatically) and

Maybe drop "piece of".

Section 6.2.1

    The operator's goal can be summarized in an instruction to the ANIMA
    ecosystem matching the following format:

       [instances of ASAs of a given type] ready to control
       [Instantiation_target_Infrastructure] with
       [Instantiation_target_parameters]

Maybe better to move this at the beginning of Section 6.2.2.

Section 6.2.3

As in Section 6.1.1., the typographic style used here is a bit odd /
unconventional.

Section 6.3

    Note: This section is to be further developed in future revisions of
    the document, especially the implications on the design of ASAs.

Is this note still valid?  (I hope not :-) )

Section 10

    of robustness that ASA designers should consider

Maybe stick a colon at the end of the line.

    1.   If despite all precautions, an ASA does encounter a fatal error,
         it should in any case restart automatically and try again.  To
         mitigate a hard loop in case of persistent failure, a suitable

Terminology: what do you mean by "hard loop"?

    8.   On the other hand, the definitions of GRASP objectives are very
         likely to be extended, using the flexibility of CBOR or JSON.
         Therefore, ASAs should be able to deal gracefully with unknown
         components within the values of objectives.

Is this in line with Section 6 of draft-iab-protocol-maintenance?
I.e., has GRASP clearly defined extensibility rules, or is this a call
for the ASA implementation to apply the robustness principle?

    At a slightly more general level, ASAs are not services in
    themselves, but they automate services.  This has a fundamental
    impact on how to design robust ASAs.  In general, when an ASA
    observes a particular state [1] of operations of the services/

"[1]" looks like a bib reference, please consider using an alternative
typography, e.g., "(1)", or "A"

Section 11

    ASAs are intended to run in an environment that is protected by the
    Autonomic Control Plane [RFC8994], admission to which depends on an
    initial secure bootstrap process such as [RFC8995].

s/such as BRSKI [RFC8995]/

    In particular, they must use secure techniques and carefully
    validate any incoming information.

"secure techniques" could be unpacked a bit, for example: "secure coding
practices" (e.g., input validation, least privilege, etc.), "secure
configuration practices" (e.g., default deny).

Appendix C

    An implementation requirement is that resource pools are kept in
    stable storage.  Otherwise, if a delegator exits for any reason, all
    the resources it has obtained or delegated are lost.  If an origin
    exits, its entire spare pool is lost.  The logic for using stable
    storage and for crash recovery is not included in the pseudocode
    below.

Is there a further requirement for the storage to be shared across all
ASAs?  What I am wondering is whether a shared global map of the current
resource allocations exists to help reconstructing a partitioned
topology (in case one ASA disappears)?  Or is the delegated resource
recall, in case the ASA delegator fails, handled by GRASP?

I think the answer depends on the resource. For the one that we fully
defined (IP address prefixes, RFC8992) there certainly needs to be a
solid logging and recovery mechanism, as there is for traditional APAM
systems. Since GRASP operations are not intrinsically idempotent, that
must be done by the ASAs. I don't think it can be a single global map,
because it has to survive network partition and reconnection. The
global map could be constructed if necessary from the log in each ASA.
On the other hand, if the resource being shared is upstream network
capacity from a given router, which is shared among many downstream
routers, there is no need for a global map.

--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call