Correcting myself from an earlier post.. On 02/24/2014 04:38 PM, Joel Fernandes wrote: >>> Also with respect to virt_dma (which is used by edma to manage all the >>> descriptors and lists) there are too many lists: submitted, issued, >>> completed etc and the descriptor moves from one to the other. I am >>> thinking if there is a way we can avoid using so many lists and just >>> have 2 lists and move the desc from one list to the other, That could >>> avoid using the intermediate list altogether and classify dma requests >>> as "done" or "not done". >> >> The reason I created separate submitted and issued lists is that it's >> much easier to manage than having everything on a single list. >> >> We could deal with the submitted vs issued list, and that's to have the >> channel store the cookie for the last issued descriptor - but I wonder >> if it's worth the effort. >> >> What I'd suggest is to try some profiling, and post some profiling >> results which show where the problems are, rather than pointing at >> bits of code you might not particularly like. >> > > Actually I did do some tracing earlier before I posted this thread- and > notice there was excessive traces of locking/unlocking. It is very light > though as you pointed and lighter without debug options. The only other > notable difference is the fact that we are now going through the dmaengine > framework in the newer kernel vs the faster one. > > One more thing in my trace is omap_dma_sync repeatedly call in memcpy_to_io > for every barrier call which is not necessary. I am working on a fix this. > > On turning off DEBUG_KERNEL and running more tests, I do see some > improvements however the throughput reduction is still =~ 10% > > With a modified openssl speed test app, I sent 16-byte sized block > repeatedly to the AES crypto hardware accelerator using EDMA: > > On v3.13.5 kernel: > root@am335x-evm:~# openssl speed -evp aes-128-cbc -engine cryptodev > engine "cryptodev" set. > Doing aes-128-cbc for 3s on 16 size blocks: 79902 aes-128-cbc's > > With v3.2 kernel, > Doing aes-128-cbc for 3s on 16 size blocks: 92314 aes-128-cbc's > > So we're able to encrypt around 13k more ops, or around 4.5k ops/second > with 3.13.5 We're able to encrypt around 13k more ops, or around 4.5k ops/second with the older 3.2 kernel that didn't use DMAEngine. Regards, -Joel -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html