Following patch series rewrites the DMA code to be cleaner and faster. Earlier, only a single SG was used for DMA purpose, and the SG-list passed from the crypto layer was being copied and DMA'd one entry at a time. This turns out to be quite inefficient and lot of code, we replace it with much simpler approach that directly passes the SG-list from crypto to the DMA layers for cases where possible. For all cases where such a direct passing of SG list is not possible, we create a new SG-list and do the copying. This is still better than before, as we create an SG list as big as needed and not just 1-element list. We also add PIO mode support to the driver, and switch to it whenever the DMA channel allocation is not available. This also has shown to give good performance for small blocks as shown below. Tests have been performed on AM335x, OMAP4 and AM437x SoCs. Below is a sample run on AM335x SoC (beaglebone board), showing performance improvement (20% for 8K blocks): With DMA rewrite (key size = 128-bit) 16 byte blocks: 4318 operations in 1 seconds (69088 bytes) 64 byte blocks: 4360 operations in 1 seconds (279040 bytes) 256 byte blocks: 3609 operations in 1 seconds (923904 bytes) 1024 byte blocks: 3418 operations in 1 seconds (3500032 bytes) 8192 byte blocks: 1766 operations in 1 seconds (14467072 bytes) Without DMA rewrite: 16 byte blocks: 4417 operations in 1 seconds (70672 bytes) 64 byte blocks: 4221 operations in 1 seconds (270144 bytes) 256 byte blocks: 3528 operations in 1 seconds (903168 bytes) 1024 byte blocks: 3281 operations in 1 seconds (3359744 bytes) 8192 byte blocks: 1460 operations in 1 seconds (11960320 bytes) With PIO mode, good performance is observed for small blocks: 16 byte blocks: 20585 operations in 1 seconds (329360 bytes) 64 byte blocks: 8106 operations in 1 seconds (518784 bytes) 256 byte blocks: 2359 operations in 1 seconds (603904 bytes) 1024 byte blocks: 605 operations in 1 seconds (619520 bytes) 8192 byte blocks: 79 operations in 1 seconds (647168 bytes) Future work in this direction would be to dynamically change between PIO/DMA mode based on the block size. Changes since last series: * Unaligned cases for omap-aes are handled with patch: "Add support for cases of unaligned lengths" * Support for am437x SoC is added and tested. * Changes following review comments on debug patch Note: The debug patch: "crypto: omap-aes: Add useful debug macros" will generate a checkpatch error, which cannot be fixed. Refer to patch for error message and reasons for why cannot be fixed, thanks. Joel Fernandes (14): crypto: scatterwalk: Add support for calculating number of SG elements crypto: omap-aes: Add useful debug macros crypto: omap-aes: Populate number of SG elements crypto: omap-aes: Simplify DMA usage by using direct SGs crypto: omap-aes: Sync SG before DMA operation crypto: omap-aes: Remove previously used intermediate buffers crypto: omap-aes: Add IRQ info and helper macros crypto: omap-aes: PIO mode: Add IRQ handler and walk SGs crypto: omap-aes: PIO mode: platform data for OMAP4/AM437x and trigger crypto: omap-aes: Switch to PIO mode during probe crypto: omap-aes: Add support for cases of unaligned lengths crypto: omap-aes: Convert kzalloc to devm_kzalloc crypto: omap-aes: Convert request_irq to devm_request_irq crypto: omap-aes: Kconfig: Add build support for AM437x crypto/scatterwalk.c | 22 ++ drivers/crypto/Kconfig | 2 +- drivers/crypto/omap-aes.c | 466 +++++++++++++++++++++++------------------- include/crypto/scatterwalk.h | 2 + 4 files changed, 284 insertions(+), 208 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html