Hi Steffen, I'm working on some padata patches and stumbled across this thread about the purpose of the callback CPU in padata_do_parallel. https://lore.kernel.org/lkml/20100402112326.GA19502@xxxxxxxxxxx/ The relevant part is, andrew> - Why would I want to specify which CPU the parallel completion andrew> callback is to be executed on? steffen> Well, for IPsec for example it is quite interesting to separate the steffen> different flows to different cpus. pcrypt does this by choosing different steffen> callback cpus for the requests belonging to different transforms. steffen> Others might want to see the object on the same cpu as it was before steffen> the parallel codepath. Not too familiar with IPsec, but I'm guessing it's interesting to separate the flows for performance reasons. Is the goal to keep multiple flows from interfering with each other (ensuring they run on different CPUs), or maybe to get better locality (ensuring each always runs on the same CPU)? It'd be helpful if you could expand on this. By the way, the padata patches extend the code to parallelize more places around the kernel, as Peter suggested. https://lore.kernel.org/lkml/20181106203411.pdce6tgs7dncwflh@xxxxxxxxxxxxxxxxxxxxxxxxxx/ Thanks, Daniel