On 2023/8/23 17:28, Herbert Xu wrote: > On Wed, Aug 23, 2023 at 07:30:47AM +0000, Lu Jialin wrote: >> We found a hungtask bug in test_aead_vec_cfg as follows: >> >> INFO: task cryptomgr_test:391009 blocked for more than 120 seconds. >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> Call trace: >> __switch_to+0x98/0xe0 >> __schedule+0x6c4/0xf40 >> schedule+0xd8/0x1b4 >> schedule_timeout+0x474/0x560 >> wait_for_common+0x368/0x4e0 >> wait_for_completion+0x20/0x30 >> test_aead_vec_cfg+0xab4/0xd50 >> test_aead+0x144/0x1f0 >> alg_test_aead+0xd8/0x1e0 >> alg_test+0x634/0x890 >> cryptomgr_test+0x40/0x70 >> kthread+0x1e0/0x220 >> ret_from_fork+0x10/0x18 >> Kernel panic - not syncing: hung_task: blocked tasks >> >> For padata_do_parallel, when the return err is 0 or -EBUSY, it will call >> wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal >> case, aead_request_complete() will be called in pcrypt_aead_serial and the >> return err is 0 for padata_do_parallel. But, when pinst->flags is >> PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it >> won't call aead_request_complete(). Therefore, test_aead_vec_cfg will >> hung at wait_for_completion(&wait->completion), which will cause >> hungtask. >> >> The problem comes as following: >> (padata_do_parallel) | >> rcu_read_lock_bh(); | >> err = -EINVAL; | (padata_replace) >> | pinst->flags |= PADATA_RESET; >> err = -EBUSY | >> if (pinst->flags & PADATA_RESET) | >> rcu_read_unlock_bh() | >> return err >> >> In order to resolve the problem, we retry at most 5 times when >> padata_do_parallel return -EBUSY. For more than 5 times, we replace the >> return err -EBUSY with -EAGAIN, which means parallel_data is changing, and >> the caller should call it again. > > Steffen, should we retry this at all? Or should it just fail as it > did before? > > Thanks, It should be fine if we don't retry and just fail with -EAGAIN and let caller handles it. It should not break the meaning of the error code. -- Best GUO Zihua