On 2024/12/11 3:12, Daniel Jordan wrote: > Hi Ridong, > > On Fri, Dec 06, 2024 at 11:48:36AM +0800, chenridong wrote: >> On 2024/12/6 7:01, Daniel Jordan wrote: >>> On Sat, Nov 23, 2024 at 08:05:09AM +0000, Chen Ridong wrote: >>>> diff --git a/kernel/padata.c b/kernel/padata.c >>>> index 5d8e18cdcb25..627014825266 100644 >>>> --- a/kernel/padata.c >>>> +++ b/kernel/padata.c >>>> @@ -319,6 +319,7 @@ static void padata_reorder(struct parallel_data *pd) >>>> if (!spin_trylock_bh(&pd->lock)) >>>> return; >>>> >>>> + padata_get_pd(pd); >>>> while (1) { >>>> padata = padata_find_next(pd, true); >>>> >>>> @@ -355,6 +356,7 @@ static void padata_reorder(struct parallel_data *pd) >>>> reorder = per_cpu_ptr(pd->reorder_list, pd->cpu); >>>> if (!list_empty(&reorder->list) && padata_find_next(pd, false)) >>>> queue_work(pinst->serial_wq, &pd->reorder_work); >>>> + padata_put_pd(pd); >>> >>> Putting the ref unconditionally here doesn't cover the case where reorder_work >>> is queued and accesses the freed pd. >>> >>> The review of patches 3-5 from this series has a potential solution for >>> this that also keeps some of these refcount operations out of the fast >>> path: >>> >>> https://lore.kernel.org/all/20221019083708.27138-1-nstange@xxxxxxx/ >>> >> >> Thank you for your review. >> >> IIUC, patches 3-5 from this series aim to fix two issue. >> 1. Avoid UAF for pd(the patch 3). >> 2. Avoid UAF for ps(the patch 4-5). >> What my patch 2 intends to fix is the issue 1. >> >> Let's focus on issue 1. >> As shown bellow, if reorder_work is queued, the refcnt must greater than >> 0, since its serial work have not be finished yet. Do your agree with that? > > I think it's possible for reorder_work to be queued even though all > serial works have finished: > > - padata_reorder finds the reorder list nonempty and sees an object from > padata_find_next, then gets preempted > - the serial work finishes in another context > - back in padata_reorder, reorder_work is queued > > Not sure this race could actually happen in practice though. > > But, I also think reorder_work can be queued when there's an unfinished > serial work, as you say, but with UAF still happening: > > padata_do_serial > ... > padata_reorder > // processes all remaining > // requests then breaks > while (1) { > if (!padata) > break; > ... > } > > padata_do_serial > // new request added > list_add > // sees the new request > queue_work(reorder_work) > padata_reorder > queue_work_on(squeue->work) > > > > <kworker context> > padata_serial_worker > // completes new request, > // no more outstanding > // requests > crypto_del_alg > // free pd > <kworker context> > invoke_padata_reorder > // UAF of pd > Sorry for being busy with other work for a while. Thank you for your patience. In theory, it does exist. Although I was unable reproduce it(I added delay helper as below), I noticed that Herbert has reported a UAF issue occurred in the padata_parallel_worker function. Therefore, it would be better to fix it in Nicolai's approach. static void padata_parallel_worker(struct work_struct *parallel_work) { + mdelay(10); + Hi, Nicolai, would you resend the patch 3 to fix this issue? I noticed you sent the patch 2 years ago, but this series has not been merged. Or may I send a patch that aligns with your approach to resolve it? Looking forward your feedback. >> pcrypt_aead_encrypt/pcrypt_aead_decrypt >> padata_do_parallel // refcount_inc(&pd->refcnt); >> padata_parallel_worker >> padata->parallel(padata); >> padata_do_serial(padata); >> // pd->reorder_list // enque reorder_list >> padata_reorder >> - case1:squeue->work >> padata_serial_worker // sub refcnt cnt >> - case2:pd->reorder_work // reorder->list is not empty >> invoke_padata_reorder // this means refcnt > 0 >> ... >> padata_serial_worker > > In other words, in case2 above, reorder_work could be queued, another > context could complete the request in padata_serial_worker, and then > invoke_padata_reorder could run and UAF when there aren't any remaining > serial works. > >> I think the patch 3(from Nicolai Stange) can also avoid UAF for pd, but >> it's complicated. > > For fixing the issue you describe, without regard for the reorder work, > I think the synchronize_rcu from near the end of the patch 3 thread is > enough. A synchronize_rcu in the slow path seems better than two > atomics in the fast path. Thank you. I tested with 'synchronize_rcu', and it can fix the issue I encountered. As I mentioned, Herbert has provided another stack, which indicates that case 2 exists. I think it would be better to fix it as patch 3 did. Thanks, Ridong