Re: [RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Deepak,

On 2024-03-19 6:55 PM, Deepak Gupta wrote:
> On Tue, Mar 19, 2024 at 2:59 PM Samuel Holland via lists.riscv.org
> <samuel.holland=sifive.com@xxxxxxxxxxxxxxx> wrote:
>>
>> Some envcfg bits need to be controlled on a per-thread basis, such as
>> the pointer masking mode. However, the envcfg CSR value cannot simply be
>> stored in struct thread_struct, because some hardware may implement a
>> different subset of envcfg CSR bits is across CPUs. As a result, we need
>> to combine the per-CPU and per-thread bits whenever we switch threads.
>>
> 
> Why not do something like this
> 
> diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> index b3400517b0a9..01ba87954da2 100644
> --- a/arch/riscv/include/asm/csr.h
> +++ b/arch/riscv/include/asm/csr.h
> @@ -202,6 +202,8 @@
>  #define ENVCFG_CBIE_FLUSH              _AC(0x1, UL)
>  #define ENVCFG_CBIE_INV                        _AC(0x3, UL)
>  #define ENVCFG_FIOM                    _AC(0x1, UL)
> +/* by default all threads should be able to zero cache */
> +#define ENVCFG_BASE                    ENVCFG_CBZE

Linux does not assume Sstrict, so without Zicboz being present in DT/ACPI, we
have no idea what the CBZE bit does--there's no guarantee it has the standard
meaning--so it's not safe to set the bit unconditionally. If that policy
changes, we could definitely simplify the code.

>  /* Smstateen bits */
>  #define SMSTATEEN0_AIA_IMSIC_SHIFT     58
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 4f21d970a129..2420123444c4 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -152,6 +152,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
>         else
>                 regs->status |= SR_UXL_64;
>  #endif
> +       current->thread_info.envcfg = ENVCFG_BASE;
>  }
> 
> And instead of context switching in `_switch_to`,
> In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.

The immediate reason is that writing envcfg in ret_from_exception() adds cycles
to every IRQ and system call exit, even though most of them will not change the
envcfg value. This is especially the case when returning from an IRQ/exception
back to S-mode, since envcfg has zero effect there.

The CSRs that are read/written in entry.S are generally those where the value
can be updated by hardware, as part of taking an exception. But envcfg never
changes on its own. The kernel knows exactly when its value will change, and
those places are:

 1) Task switch, i.e. switch_to()
 2) execve(), i.e. start_thread() or flush_thread()
 3) A system call that specifically affects a feature controlled by envcfg

So that's where this series writes it. There are a couple of minor tradeoffs
about when exactly to do the write:

- We could drop the sync_envcfg() calls outside of switch_to() by reading the
  current CSR value when scheduling out a thread, but again that adds overhead
  to the fast path to remove a tiny bit of code in the prctl() handlers.
- We don't need to write envcfg when switching to a kernel thread, only when
  switching to a user thread, because kernel threads never leave S-mode, so
  envcfg doesn't affect them. But checking the thread type takes many more
  instructions than just writing the CSR.

Overall, the optimal implementation will approximate the rule of only writing
envcfg when its value changes.

> This construction avoids
> - declaring per cpu riscv_cpu_envcfg

This is really a separate concern than when we write envcfg. The per-CPU
variable is only necessary to support hardware where a subset of harts support
Zicboz. Since the riscv_cpu_has_extension_[un]likely() helpers were added
specifically for Zicboz, I assume this is an important use case, and dropping
support for this hardware would be a regression. After all, hwprobe() allows
userspace to see that Zicboz is implemented at a per-CPU level. Maybe Andrew can
weigh in on that.

If we decide to enable Zicboz only when all harts support it, or we decide it's
safe to attempt to set the envcfg.CBZE bit on harts that do not declare support
for Zicboz, then we could drop the percpu variable.

> - syncing up
> - collection of *envcfg bits.
> 
> 
>> Signed-off-by: Samuel Holland <samuel.holland@xxxxxxxxxx>
>> ---
>>
>>  arch/riscv/include/asm/cpufeature.h |  2 ++
>>  arch/riscv/include/asm/processor.h  |  1 +
>>  arch/riscv/include/asm/switch_to.h  | 12 ++++++++++++
>>  arch/riscv/kernel/cpufeature.c      |  4 +++-
>>  4 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
>> index 0bd11862b760..b1ad8d0b4599 100644
>> --- a/arch/riscv/include/asm/cpufeature.h
>> +++ b/arch/riscv/include/asm/cpufeature.h
>> @@ -33,6 +33,8 @@ DECLARE_PER_CPU(long, misaligned_access_speed);
>>  /* Per-cpu ISA extensions. */
>>  extern struct riscv_isainfo hart_isa[NR_CPUS];
>>
>> +DECLARE_PER_CPU(unsigned long, riscv_cpu_envcfg);
>> +
>>  void riscv_user_isa_enable(void);
>>
>>  #ifdef CONFIG_RISCV_MISALIGNED
>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
>> index a8509cc31ab2..06b87402a4d8 100644
>> --- a/arch/riscv/include/asm/processor.h
>> +++ b/arch/riscv/include/asm/processor.h
>> @@ -118,6 +118,7 @@ struct thread_struct {
>>         unsigned long s[12];    /* s[0]: frame pointer */
>>         struct __riscv_d_ext_state fstate;
>>         unsigned long bad_cause;
>> +       unsigned long envcfg;
>>         u32 riscv_v_flags;
>>         u32 vstate_ctrl;
>>         struct __riscv_v_ext_state vstate;
>> diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
>> index 7efdb0584d47..256a354a5c4a 100644
>> --- a/arch/riscv/include/asm/switch_to.h
>> +++ b/arch/riscv/include/asm/switch_to.h
>> @@ -69,6 +69,17 @@ static __always_inline bool has_fpu(void) { return false; }
>>  #define __switch_to_fpu(__prev, __next) do { } while (0)
>>  #endif
>>
>> +static inline void sync_envcfg(struct task_struct *task)
>> +{
>> +       csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | task->thread.envcfg);
>> +}
>> +
>> +static inline void __switch_to_envcfg(struct task_struct *next)
>> +{
>> +       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
> 
> I've seen `riscv_cpu_has_extension_unlikely` generating branchy code
> even if ALTERNATIVES was turned on.
> Can you check disasm on your end as well.  IMHO, `entry.S` is a better
> place to pick up *envcfg.

The branchiness is sort of expected, since that function is implemented by
switching on/off a branch instruction, so the alternate code is necessarily a
separate basic block. It's a tradeoff so we don't have to write assembly code
for every bit of code that depends on an extension. However, the cost should be
somewhat lowered since the branch is unconditional and so entirely predictable.

If the branch turns out to be problematic for performance, then we could use
ALTERNATIVE directly in sync_envcfg() to NOP out the CSR write.

>> +               sync_envcfg(next);
>> +}
>> +
>>  extern struct task_struct *__switch_to(struct task_struct *,
>>                                        struct task_struct *);
>>
>> @@ -80,6 +91,7 @@ do {                                                  \
>>                 __switch_to_fpu(__prev, __next);        \
>>         if (has_vector())                                       \
>>                 __switch_to_vector(__prev, __next);     \
>> +       __switch_to_envcfg(__next);                     \
>>         ((last) = __switch_to(__prev, __next));         \
>>  } while (0)
>>
>> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
>> index d1846aab1f78..32aaaf41f8a8 100644
>> --- a/arch/riscv/kernel/cpufeature.c
>> +++ b/arch/riscv/kernel/cpufeature.c
>> @@ -44,6 +44,8 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
>>  /* Per-cpu ISA extensions. */
>>  struct riscv_isainfo hart_isa[NR_CPUS];
>>
>> +DEFINE_PER_CPU(unsigned long, riscv_cpu_envcfg);
>> +
>>  /* Performance information */
>>  DEFINE_PER_CPU(long, misaligned_access_speed);
>>
>> @@ -978,7 +980,7 @@ arch_initcall(check_unaligned_access_all_cpus);
>>  void riscv_user_isa_enable(void)
>>  {
>>         if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ))
>> -               csr_set(CSR_ENVCFG, ENVCFG_CBZE);
>> +               this_cpu_or(riscv_cpu_envcfg, ENVCFG_CBZE);

If we drop the percpu variable, this becomes

	if (riscv_has_extension_unlikely(RISCV_ISA_EXT_ZICBOZ))
		current->thread.envcfg |= ENVCFG_CBZE;

since the init thread's envcfg gets copied to all other threads via fork(), and
we can drop the call to riscv_user_isa_enable() from smp_callin(). Or if we
decide CBZE is always safe to set, then the function is even simpler:

	current->thread.envcfg = ENVCFG_CBZE;

Regards,
Samuel

>>  }
>>
>>  #ifdef CONFIG_RISCV_ALTERNATIVE
>> --
>> 2.43.1





[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux