Re: [PATCH v12 01/19] Documentation/x86: Secure Launch kernel documentation

Jarkko Sakkinen <jarkko@xxxxxxxxxx> · Fri, 7 Mar 2025 07:10:41 +0200

 On Thu, Dec 19, 2024 at 11:41:58AM -0800, Ross Philipson wrote:
> From: "Daniel P. Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>
> 
> Introduce background, overview and configuration/ABI information
> for the Secure Launch kernel feature.
> 
> Signed-off-by: Daniel P. Smith <dpsmith@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Ross Philipson <ross.philipson@xxxxxxxxxx>
> Reviewed-by: Bagas Sanjaya <bagasdotme@xxxxxxxxx>
> ---
>  Documentation/security/index.rst              |   1 +
>  .../security/launch-integrity/index.rst       |  11 +
>  .../security/launch-integrity/principles.rst  | 317 ++++++++++
>  .../secure_launch_details.rst                 | 587 ++++++++++++++++++
>  .../secure_launch_overview.rst                | 252 ++++++++
>  5 files changed, 1168 insertions(+)
>  create mode 100644 Documentation/security/launch-integrity/index.rst
>  create mode 100644 Documentation/security/launch-integrity/principles.rst
>  create mode 100644 Documentation/security/launch-integrity/secure_launch_details.rst
>  create mode 100644 Documentation/security/launch-integrity/secure_launch_overview.rst
> 
> diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
> index 3e0a7114a862..f89741271ed0 100644
> --- a/Documentation/security/index.rst
> +++ b/Documentation/security/index.rst
> @@ -20,3 +20,4 @@ Security Documentation
>     landlock
>     secrets/index
>     ipe
> +   launch-integrity/index
> diff --git a/Documentation/security/launch-integrity/index.rst b/Documentation/security/launch-integrity/index.rst
> new file mode 100644
> index 000000000000..838328186dd2
> --- /dev/null
> +++ b/Documentation/security/launch-integrity/index.rst
> @@ -0,0 +1,11 @@
> +=====================================
> +System Launch Integrity documentation
> +=====================================
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   principles
> +   secure_launch_overview
> +   secure_launch_details
> +
> diff --git a/Documentation/security/launch-integrity/principles.rst b/Documentation/security/launch-integrity/principles.rst
> new file mode 100644
> index 000000000000..a0553d1d93c2
> --- /dev/null
> +++ b/Documentation/security/launch-integrity/principles.rst
> @@ -0,0 +1,317 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +.. Copyright (c) 2019-2024 Daniel P. Smith <dpsmith@xxxxxxxxxxxxxxxxxxxx>
> +
> +=======================
> +System Launch Integrity
> +=======================
> +
> +:Author: Daniel P. Smith
> +:Date: August 2024
> +
> +This document serves to establish a common understanding of what a system
> +launch is, the integrity concern for system launch, and why using a Root of Trust
> +(RoT) from a Dynamic Launch may be desirable. Throughout this document,
> +terminology from the Trusted Computing Group (TCG) and National Institute for
> +Standards and Technology (NIST) is used to ensure that vendor natural language is
> +used to describe and reference security-related concepts.
> +
> +System Launch
> +=============
> +
> +There is a tendency to only consider the classical power-on boot as the only
> +means to launch an Operating System (OS) on a computer system. In fact, most
> +modern processors support two system launch methods. To provide clarity,
> +it is important to establish a common definition of a system launch: during
> +a single power life cycle of a system, a system launch consists of an initialization
> +event, typically in hardware, that is followed by an executing software payload
> +that takes the system from the initialized state to a running state. Driven by
> +the Trusted Computing Group (TCG) architecture, modern processors are able to
> +support two methods of system launch. These two methods of system launch are known
> +as Static Launch and Dynamic Launch.
> +
> +Static Launch
> +-------------
> +
> +Static launch is the system launch associated with the power cycle of the CPU.
> +Thus, static launch refers to the classical power-on boot where the
> +initialization event is the release of the CPU from reset and the system
> +firmware is the software payload that brings the system up to a running state.
> +Since static launch is the system launch associated with the beginning of the
> +power lifecycle of a system, it is therefore a fixed, one-time system launch.
> +It is because of this that static launch is referred to and thought of as being
> +"static".
> +
> +Dynamic Launch
> +--------------
> +
> +Modern CPUs architectures provides a mechanism to re-initialize the system to a
> +"known good" state without requiring a power event. This re-initialization
> +event is the event for a dynamic launch and is referred to as the Dynamic
> +Launch Event (DLE). The DLE functions by accepting a software payload, referred
> +to as the Dynamic Configuration Environment (DCE), that execution is handed to
> +after the DLE is invoked. The DCE is responsible for bringing the system back
> +to a running state. Since the dynamic launch is not tied to a power event like
> +the static launch, this enables a dynamic launch to be initiated at any time
> +and multiple times during a single power life cycle. This dynamism is the
> +reasoning behind referring to this system launch as "dynamic".
> +
> +Because a dynamic launch can be conducted at any time during a single power
> +life cycle, they are classified into one of two types: an early launch or a
> +late launch.
> +
> +:Early Launch: When a dynamic launch is used as a transition from a static
> +   launch chain to the final Operating System.
> +
> +:Late Launch: The usage of a dynamic launch by an executing Operating System to
> +   transition to a "known good" state to perform one or more operations, e.g. to
> +   launch into a new Operating System.
> +
> +System Integrity
> +================
> +
> +A computer system can be considered a collection of mechanisms that work
> +together to produce a result. The assurance that the mechanisms are functioning
> +correctly and producing the expected result is the integrity of the system. To
> +ensure a system's integrity, there is a subset of these mechanisms, commonly
> +referred to as security mechanisms, that is present to help ensure the system
> +produces the expected result or at least detects the potential of an unexpected
> +result. Since the security mechanisms are relied upon to ensue the integrity of
> +the system, these mechanisms are trusted. Upon inspection, these security
> +mechanisms each have a set of properties and these properties can be evaluated
> +to determine how susceptible a mechanism might be to failure. This assessment is
> +referred to as the Strength of Mechanism, which allows the trustworthiness of
> +that mechanism to be quantified.
> +
> +For software systems, there are two system states for which the integrity is
> +critical: when the software is loaded into memory and when the software is
> +executing on the hardware. Ensuring that the expected software is loaded into
> +memory is referred to as load-time integrity while ensuring that the software
> +executing is the expected software is the runtime integrity of that software.

I'd consider deleting the first paragraph. It really does not provide
anything useful. The 2nd paragraph is totally sufficient introduction to
the topic, and makes factors more sense.

We don't need a phrase in kernel documentation stating that computer is
a system that produces a result :-)

Should be at least easy enough change to make. I don't think it even
needs any refined version as the text below provides more than enough
(in many places useful) detail to the topic.

> +
> +Load-time Integrity
> +-------------------
> +
> +It is critical to understand what load-time integrity establishes about a
> +system and what is assumed, i.e. what is being trusted. Load-time integrity is

I'd delete the very first sentence completely. It serves zero purpose.
This would be so much less exhausting read if I could just start on
getting the information what load-time integrity is.

Reassurance serves zero purpose. It is up to the read of kernel
documentation to make such evaluation.

> +when a trusted entity, i.e. an entity with an assumed integrity, takes an
> +action to assess an entity being loaded into memory before it is used. A
> +variety of mechanisms may be used to conduct the assessment, each with
> +different properties. A particular property is whether the mechanism creates an
> +evidence of the assessment. Often either cryptographic signature checking or
> +hashing are the common assessment operations used.
> +
> +A signature checking assessment functions by requiring a representation of the
> +accepted authorities and uses those representations to assess if the entity has
> +been signed by an accepted authority. The benefit to this process is that
> +assessment process includes an adjudication of the assessment. The drawbacks
> +are that 1) the adjudication is susceptible to tampering by the Trusted
> +Computing Base (TCB), 2) there is no evidence to assert that an untampered
> +adjudication was completed, and 3) the system must be an active participant in
> +the key management infrastructure.
> +
> +A cryptographic hashing assessment does not adjudicate the assessment, but

This is actually language barrier: is "cryptographic hashing assesment"
same as "cryptographic measurement"? I'd consider using latter as it has
wider reach. Most people know what measurement means if they know any of
cryptography.

> +instead generates evidence of the assessment to be adjudicated independently.
> +The benefits to this approach is that the assessment may be simple such that it
> +may be implemented in an immutable mechanism, e.g. in hardware.  Additionally,
> +it is possible for the adjudication to be conducted where it cannot be tampered
> +with by the TCB. The drawback is that a compromised environment will be allowed
> +to execute until an adjudication can be completed.
> +
> +Ultimately, load-time integrity provides confidence that the correct entity was
> +loaded and in the absence of a run-time integrity mechanism assumes, i.e.
> +trusts, that the entity will never become corrupted.
> +
> +Runtime Integrity
> +-----------------
> +
> +Runtime integrity in the general sense is when a trusted entity makes an
> +assessment of an entity at any point in time during the assessed entity's
> +execution. A more concrete explanation is the taking of an integrity assessment

Great, this is better than the last subsection as it gets straight into
the topic! No reassurance part ;-)

> +of an active process executing on the system at any point during the process'
> +execution. Often the load-time integrity of an operating system's user-space,
> +i.e. the operating environment, is confused with the runtime integrity of the
> +system, since it is an integrity assessment of the "runtime" software. The
> +reality is that actual runtime integrity is a very difficult problem and thus
> +not very many solutions are public and/or available. One example of a runtime
> +integrity solution would be Johns Hopkins Advanced Physics Laboratory's (APL)
> +Linux Kernel Integrity Module (LKIM).
> +
> +Trust Chains
> +============
> +
> +Building upon the understanding of security mechanisms to establish load-time
> +integrity of an entity, it is possible to chain together load-time integrity
> +assessments to establish the integrity of the whole system. This process is
> +known as transitive trust and provides the concept of building a chain of
> +load-time integrity assessments, commonly referred to as a trust chain. These
> +assessments may be used to adjudicate the load-time integrity of the whole
> +system. This trust chain is started by a trusted entity that does the first
> +assessment. This first entity is referred to as the Root of Trust(RoT) with the
> +entities name being derived from the mechanism used for the assessment, i.e.
> +RoT for Verification (RTV) and RoT for Measurement (RTM).
> +
> +A trust chain is itself a mechanism, specifically a mechanism of mechanisms,
> +and therefore it also has a Strength of Mechanism. The factors that contribute
> +to the strength of a trust chain are:
> +
> +  - The strength of the chain's RoT
> +  - The strength of each member of the trust chain
> +  - The length, i.e. the number of members, of the chain
> +
> +Therefore, the strongest trust chains should start with a strong RoT and should
> +consist of members being of low complexity and minimize the number of members
> +participating. In a more colloquial sense, a trust chain is only as strong as its
> +weakest link, thus more links increase the probability of a weak link.
> +
> +Dynamic Launch Components
> +=========================
> +
> +The TCG architecture for dynamic launch is composed of a component series
> +used to set up and then carry out the launch. These components work together to
> +construct an RTM trust chain that is rooted in the dynamic launch and thus commonly
> +referred to as the Dynamic Root of Trust for Measurement (DRTM) chain.
> +
> +What follows is a brief explanation of each component in execution order. A
> +subset of these components are what establishes the dynamic launch's trust
> +chain.
> +
> +Dynamic Configuration Environment Preamble
> +------------------------------------------
> +
> +The Dynamic Configuration Environment (DCE) Preamble is responsible for setting
> +up the system environment in preparation for a dynamic launch. The DCE Preamble
> +is not a part of the DRTM trust chain.
> +
> +Dynamic Launch Event
> +--------------------
> +
> +The dynamic launch event is the event, typically a CPU instruction, that
> +triggers the system's dynamic launch mechanism to begin the launch process. The
> +dynamic launch mechanism is also the RoT for the DRTM trust chain.
> +
> +Dynamic Configuration Environment
> +---------------------------------
> +
> +The dynamic launch mechanism may have resulted in a reset of a portion of the
> +system. To bring the system back to an adequate state for system software, the
> +dynamic launch will hand over control to the DCE. Prior to handing over this
> +control, the dynamic launch will measure the DCE. Once the DCE is complete, it
> +will proceed to measure and then execute the Dynamic Launch Measured
> +Environment (DLME).
> +
> +Dynamic Launch Measured Environment
> +-----------------------------------
> +
> +The DLME is the first system kernel to have control of the system, but may not
> +be the last. Depending on the usage and configuration, the DLME may be the
> +final/target operating system, or it may be a bootloader that will load the
> +final/target operating system.
> +
> +Why DRTM
> +========

Nit: maybe 

Why DTRM?
=========

> +
> +It is a fact that DRTM increases the load-time integrity of the system by
> +providing a trust chain that has an immutable hardware RoT, uses a limited
> +number of small, special purpose code to establish the trust chain that starts
> +the target operating system. As mentioned in the Trust Chain section, these are
> +the main three factors in driving up the strength of a trust chain. As has been
> +seen with the BootHole exploit, which in fact did not affect the integrity of
> +DRTM solutions, the sophistication of attacks targeting system launch is at an
> +all-time high. There is no reason a system should not employ every available
> +hardware integrity measure. This is the crux of a defense-in-depth
> +approach to system security. In the past, the now closed SMI gap was often
> +pointed to as invalidating DRTM, which in fact was nothing but a straw man
> +argument. As has continued to be demonstrated, if/when SMM is corrupted, it can
> +always circumvent all load-time integrity (SRTM and DRTM) because it is a
> +run-time integrity problem. Regardless, Intel and AMD have both deployed
> +runtime integrity for SMI and SMM which is tied directly to DRTM such that this
> +perceived deficiency is now non-existent and the world is moving forward with
> +an expectation that DRTM must be present.

Here's my general feeling about text up to this point. It's way too
verbose and has bad reach especially for non-native speakers.

I don't want nitpick every possible sentence that I think could be
made for punctual.

What I'd suggest instead would be to go through this internalla at
Oracle with some group of people couple of times and try to cut out
all the extra fat.

I gave those review comments in order to give an idea what kind of
stuff look up for. The benefit is that if you get this document more
readable that also as a side-effect lowers the barrier to review the
patch series. Right now this is more exhausting to read than some of
the actualy science papers I've read.

Hope no one takes this personally. What comes after this is much better
fit but I'd still do similar assessment.

Roughly estimated you could have a document 50% of the current length
without loss of information content just by being a factor more
punctual. I'm worried that the series gets ignored partly because
the documentation is already like climbing to a mountain.

BR, Jarkko