On Wed, 17 May 2023 at 10:54, Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote: > > On 2023-05-16 19:13:26 [+0200], Ard Biesheuvel wrote: > > On Tue, 16 May 2023 at 18:33, Sebastian Andrzej Siewior > > <bigeasy@xxxxxxxxxxxxx> wrote: > > > > > > + ard, peterz > > > > Hello Sebastian, > Hi Ard, > > > Please check whether 6.4-rc2 is equally affected - we fixed some > > issues related to BH en/disabling from asm code. > > oh oh Ard, you are the best. Thank you. > Well, I was the one who broke things in the first place, so it's only reasonable that I was the one fixing them again. For context, these changes give a substantial performance boost on the RX path to IPsec or WireGuard using SIMD based software crypto. > … > > Please take a look, and confirm whether or not we still need to drop > > the get_cpu() call from vfp_sync_hwstate() > > Yes, it needs to be dealt with. At the time vfp_sync_hwstate() the > context is fully preemptible. So that get_cpu() needs to be removed > before local_bh_disable(). But… > > If I understand the logic right, then the VFP state is saved on context > switch and the FPU access is disabled but the content remains and is in > sync with vfp_current_hw_state. Upon first usage (after the context > switch) in userland there is an exception at which point the access to > the FPU is enabled (FPEXC_EX) and the FPU content is loaded unless it is > in sync as per vfp_current_hw_state. > Exactly. On UP we lazily preserve the user space FP state, and on SMP we always preserve it on a context switch. Then, only when user space actually tries to use the SIMD context, the correct data is copied in. That way, we don't take the performance hit for tasks that are blocked in the kernel. > That means it relies on disabled preemption while operating on > vfp_current_hw_state. In general, this code relies on preserving/restoring the VFP context to/from memory to be a critical section, as it needs to run to completion once it is started. > On PREEMPT_RT softirqs are only handled in process > context (for instance at the end of the threaded interrupt). Therefore > it is sufficient to disable preemption instead of BH. It would need > something like x86's fpregs_lock(). I think this is not the only issue, to be honest. We cannot preempt tasks while they are using the SIMD unit in kernel mode, as the kernel mode context is never preserved/restored. So we at least need to disable preemption again in kernel_neon_begin() [which now relies on BH disable to disable preemption but as you point out, that is only the case on !RT] Given this, I wonder whether supporting kernel mode NEON on PREEMPT_RT is worth it to begin with. AIUI, the alternative is to disable both preemption and BH when touching the VFP state. > Otherwise vfp_entry() can get preempted after decisions based on > vfp_current_hw_state have been made. Also kernel_neon_begin() could get > preempted after enabling FPU. I think based on current logic, after > kernel_neon_begin() a task can get preempted on PREEMPT_RT and since > vfp_current_hw_state is NULL the registers won't be saved and a zero set > of registers will be loaded once the neon task gets back on the CPU (it > seems that VFP exceptions in kernel mode are treated the same way as > those in user mode). > > What do you think? > Indeed. kernel_neon_begin() no longer disabling preemption is definitely wrong on PREEMPT_RT. The question is whether we want to untangle this or just make PREEMPT_RT and KERNEL_MODE_NEON mutually exclusive. This, by itself, makes quite a lot of sense, actually, given that on actual 32-bit hardware, the SIMD speedup is not that spectacular (the AES and other crypto instructions, which do give an order of magnitude speedup are only implemented on 64-bit cores, which usually run a 64-bit kernel). So if real-time behavior is a requirement, using ordinary crypto implemented in C (which does not require preemption to be disabled) is likely preferred anyway.