From: Robert Elliott > Sent: 12 October 2022 22:59 > > As done by the ECB and CBC helpers in arch/x86/crypt/ecb_cbc_helpers.h, > limit the number of bytes processed between kernel_fpu_begin() and > kernel_fpu_end() calls. > > Those functions call preempt_disable() and preempt_enable(), so > the CPU core is unavailable for scheduling while running, leading to: > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: ... How long were the buffers being processed when the rcu stall was reported? It looks like you are adding kernel_fpu_end(); kernel_fpu_begin() pairs every 4096 bytes. I'd guess the crc instruction runs at 4 bytes/clock (or at least gets somewhere near that). So you are talking of few thousand clocks at most. A pci read from a device can easily take much longer than that. So I'm surprised you need to do such small buffers to avoid rcu stalls. The kernel_fpu_end(); kernel_fpu_begin() pair pair will also cost. (Maybe not as much as the first kernel_fpu_begin() ?) Some performance figures might be enlightening. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)