Re: [PATCH v2 04/21] arm64/sme: Document SME 2 and SME 2.1 ABI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/11/22 10:17, Luis Machado wrote:
On 11/1/22 14:33, Mark Brown wrote:
As well as a number of simple features which only add new instructions and
require corresponding hwcaps SME2 introduces a new register ZT0 for which
we must define ABI. Fortunately this is a fixed size 512 bits and therefore
much more straightforward than the base SME state, the only wrinkle is that
it is only accessible when ZA is accessible.

While there is only a single register the architecture is written with a
view to exensibility, including a number in the name, so follow this in the
ABI.

Signed-off-by: Mark Brown <broonie@xxxxxxxxxx>
---
  Documentation/arm64/sme.rst | 52 ++++++++++++++++++++++++++++++-------
  1 file changed, 43 insertions(+), 9 deletions(-)

diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst
index 16d2db4c2e2e..5f7eabee4853 100644
--- a/Documentation/arm64/sme.rst
+++ b/Documentation/arm64/sme.rst
@@ -18,14 +18,19 @@ model features for SME is included in Appendix A.
  1.  General
  -----------
-* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA
-  register state and TPIDR2_EL0 are tracked per thread.
+* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA and (when
+  present) ZT0 register state and TPIDR2_EL0 are tracked per thread.
  * The presence of SME is reported to userspace via HWCAP2_SME in the aux vector
    AT_HWCAP2 entry.  Presence of this flag implies the presence of the SME
    instructions and registers, and the Linux-specific system interfaces
    described in this document.  SME is reported in /proc/cpuinfo as "sme".
+* The presence of SME2 is reported to userspace via HWCAP2_SME in the

I suppose HWCAP2_SME -> HWCAP2_SME2?

+  aux vector AT_HWCAP2 entry.  Presence of this flag implies the presence of
+  the SME2 instructions and ZT0, and the Linux-specific system interfaces
+  described in this document.  SME2 is reported in /proc/cpuinfo as "sme2".
+
  * Support for the execution of SME instructions in userspace can also be
    detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS
    instruction, and checking that the value of the SME field is nonzero. [3]
@@ -44,6 +49,7 @@ model features for SME is included in Appendix A.
      HWCAP2_SME_B16F32
      HWCAP2_SME_F32F32
      HWCAP2_SME_FA64
+        HWCAP2_SME2
    This list may be extended over time as the SME architecture evolves.
@@ -52,8 +58,8 @@ model features for SME is included in Appendix A.
    cpu-feature-registers.txt for details.
  * Debuggers should restrict themselves to interacting with the target via the
-  NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets.  The recommended way
-  of detecting support for these regsets is to connect to a target process
+  NT_ARM_SVE, NT_ARM_SSVE, NT_ARM_ZA and NT_ARM_ZT regsets.  The recommended
+  way of detecting support for these regsets is to connect to a target process
    first and then attempt a
      ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov).
@@ -89,13 +95,13 @@ be zeroed.
  -------------------------
  * On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the
-  ZA matrix are preserved.
+  ZA matrix and ZT0 (if present) are preserved.
  * On syscall PSTATE.SM will be cleared and the SVE registers will be handled
    as per the standard SVE ABI.
-* Neither the SVE registers nor ZA are used to pass arguments to or receive
-  results from any syscall.
+* None of the SVE registers, ZA or ZT0 are used to pass arguments to
+  or receive results from any syscall.
  * On process creation (eg, clone()) the newly created process will have
    PSTATE.SM cleared.
@@ -134,6 +140,14 @@ be zeroed.
    __reserved[] referencing this space.  za_context is then written in the
    extra space.  Refer to [1] for further details about this mechanism.
+* If ZT is supported and PSTATE.ZA==1 then a signal frame record for ZT will
+  be generated.

I noticed we refer to ZT0 as ZT sometimes. Should we use ZT0 throughout? Or maybe ZT, if it makes more sense?

Otherwise it can get a bit confusing.


Reading through the rest of the series, I noticed we're leaving room for more ZT registers in the future.

+
+* The signal record for ZT has magic ZT_MAGIC (0x73d4e827) and consists of a
+  standard signal frame header followed by a struct zt_context specifying
+  the number of ZT registers supported by the system, then zt_contxt.nregs

zt_contxt -> zt_context

+  blocks of 64 bytes of data per register.
+
  5.  Signal return
  -----------------
@@ -151,6 +165,9 @@ When returning from a signal handler:
    the signal frame does not match the current vector length, the signal return
    attempt is treated as illegal, resulting in a forced SIGSEGV.
+* If ZT is not supported or PSTATE.ZA==0 then it is illegal to have a
+  signal frame record for ZT, resulting in a forced SIGSEGV.
+
  6.  prctl extensions
  --------------------
@@ -214,8 +231,8 @@ prctl(PR_SME_SET_VL, unsigned long arg)
        vector length that will be applied at the next execve() by the calling
        thread.
-    * Changing the vector length causes all of ZA, P0..P15, FFR and all bits of
-      Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
+    * Changing the vector length causes all of ZA, ZT, P0..P15, FFR and all
+      bits of Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
        unspecified, including both streaming and non-streaming SVE state.
        Calling PR_SME_SET_VL with vl equal to the thread's current vector
        length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag,
@@ -317,6 +334,15 @@ The regset data starts with struct user_za_header, containing:
  * The effect of writing a partial, incomplete payload is unspecified.
+* A new regset NT_ARM_ZT is defined for for access to ZT state via

typo, double for

+  PTRACE_GETREGSET and PTRACE_SETREGSET.
+
+* The NT_ARM_ZT regset consists of a single 512 bit register.
+
+* When PSTATE.ZA==0 reads of NT_ARM_ZT will report all bits of ZT as 0.
+
+* Writes to NT_ARM_ZT will set PSTATE.ZA to 1.
+
  8.  ELF coredump extensions
  ---------------------------
@@ -331,6 +357,11 @@ The regset data starts with struct user_za_header, containing:
    been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread
    when the coredump was generated.
+* A NT_ARM_ZT note will be added to each coredump for each thread of the
+  dumped process.  The contents will be equivalent to the data that would have
+  been read if a PTRACE_GETREGSET of NT_ARM_ZT were executed for each thread
+  when the coredump was generated.
+
  * The NT_ARM_TLS note will be extended to two registers, the second register
    will contain TPIDR2_EL0 on systems that support SME and will be read as
    zero with writes ignored otherwise.
@@ -406,6 +437,9 @@ In A64 state, SME adds the following:
    For best system performance it is strongly encouraged for software to enable
    ZA only when it is actively being used.
+* A new ZT0 register is introduced when SME2 is present. This is a 512 bit
+  register which is accessible PSTATE.ZA is set, as ZA itself is.

accessible WHEN?

+
  * Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and
    SMSTOP instructions or by access to the SVCR system register:





[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux