RE: [PATCH] crypto: x86/sha512 - load based on CPU features

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Elliott, Robert (Servers) <elliott@xxxxxxx>
> Sent: Friday, August 19, 2022 5:37 PM
> To: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> Subject: RE: [PATCH] crypto: x86/sha512 - load based on CPU features
> 
> > -----Original Message-----
> > From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> > Sent: Friday, August 19, 2022 6:05 AM
> > Subject: Re: [PATCH] crypto: x86/sha512 - load based on CPU features
> >
> > Robert Elliott <elliott@xxxxxxx> wrote:
> > > x86 optimized crypto modules built as modules rather than built-in
> > > to the kernel end up as .ko files in the filesystem, e.g., in
> > > /usr/lib/modules. If the filesystem itself is a module, these might
> > > not be available when the crypto API is initialized, resulting in
> > > the generic implementation being used (e.g., sha512_transform rather
> > > than sha512_transform_avx2).
> > >
> > > In one test case, CPU utilization in the sha512 function dropped
> > > from 15.34% to 7.18% after forcing loading of the optimized module.
> > >
> > > Add module aliases for this x86 optimized crypto module based on CPU
> > > feature bits so udev gets a chance to load them later in the boot
> > > process when the filesystems are all running.
> > >
> > > Signed-off-by: Robert Elliott <elliott@xxxxxxx>
> > > ---
> > > arch/x86/crypto/sha512_ssse3_glue.c | 10 ++++++++++
> > > 1 file changed, 10 insertions(+)
> >
> > Patch applied.  Thanks.
> 
> I'll post a series that applies this technique to all the other
> x86 modules, if there are no problems reported with sha512.

Suggestion: please revert the sha512-x86 patch for a while.

I've been running with this technique applied to all x86 modules for
a few weeks and noticed a potential problem.

I've seen several cases of "rcu_preempt detected expedited stalls" 
with stack dumps pointing to the x86 crypto functions:
    rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 12-... } 22 jiffies s: 277 root: 0x1/.

(It doesn't always happen; I haven't found a definitive recipe to
trigger the problem yet.)

Three instances occurred during boot, with the stack track pointing
to the sha512-x86 function (my system is set to use SHA-512 for 
module signing, so it's using that algorithm a lot):
	module_sig_check/mod_verify_sig/
	pkcs7_verify/pkcs7_digest/
	sha512_finup/sha512_base_do_update

I also triggered three of them by running "modprobe tcrypt mode=n"
looping from 0 to 1000, all in the aes-x86 module:
	tcrypt: testing encryption speed of sync skcipher cts(cbc(aes)) using cts(cbc(aes-aesni))
	tcrypt: testing encryption speed of sync skcipher cfb(aes) using cfb(aes-aesni)
	tcrypt: testing decryption speed of sync skcipher cfb(aes) using cfb(aes-aesni)

In that case the stack traces pointed to:
	do_test/prepare_alloc_pages/test_skcipher_speed.cold

Theory:
I noticed the proposed aria-x86 module has these calls surrounding
its main data processing functions:
	kernel_fpu_begin();
	kernel_fpu_end();

The sha-x86 and aes-x86 implementations use those as well. For
example, sha512_update() is structured as:
        kernel_fpu_begin();
        sha512_base_do_update(desc, data, len, sha512_xform);
        kernel_fpu_end();

and aesni_encrypt() is structured as:
	kernel_fpu_begin();
	aesni_enc(ctx, dst, src);
	kernel_fpu_end();

I noticed that kernel_fpu_begin() includes this:
	preempt_disable();

and kernel_fpu_end() has:
	preempt_enable();

Is that sufficient to cause the RCU problems? 

Although preempt_disable isn't disabling interrupts, it's blocking the
scheduler from using the CPUs (A few of the arm AES functions mess
with interrupts, but none of the others seem to do so). So, I suspect
that could lead to RCU problems and soft lockups, but not hard
lockups.

Do these functions need to break up their processing into smaller chunks
(e.g., a few Megabytes), calling kernel_fpu_end() periodically to 
allow the scheduler to take over the CPUs if needed? If so, what
chunk size would be appropriate?



Example stack trace:
[   29.729811] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 12-... } 22 jiffies s: 277 root: 0x1/.
[   29.729815] rcu: blocking rcu_node structures (internal RCU debug): l=1:0-13:0x1000/.
[   29.729818] Task dump for CPU 12:
[   29.729819] task:modprobe        state:R  running task     stack:    0 pid: 1703 ppid:  1702 flags:0x0000400a
[   29.729822] Call Trace:
[   29.729823]  <TASK>
[   29.729825]  ? sysvec_apic_timer_interrupt+0xab/0xc0
[   29.729830]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[   29.729835]  ? loop1+0x3f2/0x98f [sha512_ssse3]
[   29.729839]  ? crypto_create_tfm_node+0x33/0x100
[   29.729843]  ? nowork+0x6/0x6 [sha512_ssse3]
[   29.729844]  ? sha512_base_do_update.isra.0+0xeb/0x160 [sha512_ssse3]
[   29.729847]  ? nowork+0x6/0x6 [sha512_ssse3]
[   29.729849]  ? sha512_finup.part.0+0x1de/0x230 [sha512_ssse3]
[   29.729851]  ? pkcs7_digest+0xaf/0x1f0
[   29.729854]  ? pkcs7_verify+0x61/0x540
[   29.729856]  ? verify_pkcs7_message_sig+0x4a/0xe0
[   29.729859]  ? pkcs7_parse_message+0x174/0x1b0
[   29.729861]  ? verify_pkcs7_signature+0x4c/0x80
[   29.729862]  ? mod_verify_sig+0x74/0x90
[   29.729867]  ? module_sig_check+0x87/0xd0
[   29.729868]  ? load_module+0x4e/0x1fc0
[   29.729871]  ? xfs_file_read_iter+0x70/0xe0 [xfs]
[   29.729955]  ? __kernel_read+0x118/0x290
[   29.729959]  ? ima_post_read_file+0xac/0xc0
[   29.729962]  ? kernel_read_file+0x211/0x2a0
[   29.729965]  ? __do_sys_finit_module+0x93/0xf0
[   29.729967]  ? __do_sys_finit_module+0x93/0xf0
[   29.729969]  ? do_syscall_64+0x58/0x80
[   29.729971]  ? syscall_exit_to_user_mode+0x17/0x40
[   29.729973]  ? do_syscall_64+0x67/0x80
[   29.729975]  ? exit_to_user_mode_prepare+0x18f/0x1f0
[   29.729976]  ? syscall_exit_to_user_mode+0x17/0x40
[   29.729977]  ? do_syscall_64+0x67/0x80
[   29.729979]  ? syscall_exit_to_user_mode+0x17/0x40
[   29.729980]  ? do_syscall_64+0x67/0x80
[   29.729982]  ? exc_page_fault+0x70/0x170
[   29.729983]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   29.729986]  </TASK>
[   29.753810] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { } 236 jiffies s: 281 root: 0x0/.






[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]
  Powered by Linux