Re: [PATCH v2] crypto: Fix hungtask for PADATA_RESET

"Guozihua (Scott)" <guozihua@xxxxxxxxxx> · Fri, 1 Sep 2023 10:28:08 +0800



On 2023/8/23 17:28, Herbert Xu wrote:
> On Wed, Aug 23, 2023 at 07:30:47AM +0000, Lu Jialin wrote:
>> We found a hungtask bug in test_aead_vec_cfg as follows:
>>
>> INFO: task cryptomgr_test:391009 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Call trace:
>>  __switch_to+0x98/0xe0
>>  __schedule+0x6c4/0xf40
>>  schedule+0xd8/0x1b4
>>  schedule_timeout+0x474/0x560
>>  wait_for_common+0x368/0x4e0
>>  wait_for_completion+0x20/0x30
>>  test_aead_vec_cfg+0xab4/0xd50
>>  test_aead+0x144/0x1f0
>>  alg_test_aead+0xd8/0x1e0
>>  alg_test+0x634/0x890
>>  cryptomgr_test+0x40/0x70
>>  kthread+0x1e0/0x220
>>  ret_from_fork+0x10/0x18
>>  Kernel panic - not syncing: hung_task: blocked tasks
>>
>> For padata_do_parallel, when the return err is 0 or -EBUSY, it will call
>> wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal
>> case, aead_request_complete() will be called in pcrypt_aead_serial and the
>> return err is 0 for padata_do_parallel. But, when pinst->flags is
>> PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it
>> won't call aead_request_complete(). Therefore, test_aead_vec_cfg will
>> hung at wait_for_completion(&wait->completion), which will cause
>> hungtask.
>>
>> The problem comes as following:
>> (padata_do_parallel)                 |
>>     rcu_read_lock_bh();              |
>>     err = -EINVAL;                   |   (padata_replace)
>>                                      |     pinst->flags |= PADATA_RESET;
>>     err = -EBUSY                     |
>>     if (pinst->flags & PADATA_RESET) |
>>         rcu_read_unlock_bh()         |
>>         return err
>>
>> In order to resolve the problem, we retry at most 5 times when
>> padata_do_parallel return -EBUSY. For more than 5 times, we replace the
>> return err -EBUSY with -EAGAIN, which means parallel_data is changing, and
>> the caller should call it again.
> 
> Steffen, should we retry this at all? Or should it just fail as it
> did before?
> 
> Thanks,

It should be fine if we don't retry and just fail with -EAGAIN and let
caller handles it. It should not break the meaning of the error code.
-- 
Best
GUO Zihua